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ABSTRACT 


Seme Bayesian decision models which involve a finite Harkev chain 
with uncertain transition probabilities are studied in this report. The 
principal theoretical features of these models are set forth and various 


questions of mmericsal computetion are considered. 


It 4s assumed, for the most part, that the family of prior distributions 
of the matrix of transition prebabilities 4s closed under sampling. This 
concept is defined amd some properties of closed famijies of distributions 
are obtained. It 4s show that there are an arbitrarily large number of 
such families, giving considerable generality to the entire study. A 
Giseounted adaptive control model for a Markov chain with alternative 
transition probabilities and rewards is then formalated as a set of 
funetional equations. These equations are show to have a unique bountied 
solution and a method of successive approximations is considered which 
converges monetorically te this solution. 


The means, variances, and coveriances of the m-step transition 
probabilities, the steady-state probabilities, the tetal discounted reward 
veoter, and the process gain are then considered. It is shown that, 
under quite general conditions, the mean n-step transition probability 
matrix approaches the matrix of steady-state probabilities as n-pco. © 
These remilts are applied to discounted terminal contro] msdels in which 
& Markov chain with alternative transition probabilities an! rewards 
is sampled, at a cost, until a terminal decision point 4s reached. At 
that time a terminal policy is chosen and the system is operated 
indefiritely under this poliey with no further sampling. It is show 
that a terminal deaision point ie reached with probability one under ean 
optimal sampling strategy. Thege models are formulated as functional 
equations, which are shown to have a unique bounded solution, and 
guecessive approximation techniques are investigated. 


We then turn te fixed sample aise analysis. The Whittle distritution, 
the matriz beta distribution, and the beta-Whittle distribution are 
introduced. It is assumed that a finite Markov chain with uncertain 
trensition probattilities is observed for n consecutive transitions and 
the prior-posterior and preposterlor analysis is developed. 
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CHAPTER 1 
INTRODUCTION 


4.4 Devesien Decision Theary and Markey Chains. 
The basie ooneept of a Markov chain was introduced by Ae Ae Markov 



















yf A 4 
a. “in 1907 and since that time the literature on the subject has grow 
ihe ea 

og Fundamental investigations by Kolmogorov in the 1930°s 


: ‘The pesca state of the theory of Ilarkov ates is summarized 
ea (12). ; 

; c _ By 1950 it was well recognized that the Narkev chain is a useful 

el l for a miltdtude of physical processes and an inereasing muanber of 
= ations of the mathenatical theory have been made to probless in 


= ani ein davelshoet estination have been investigated. These 
2 ¢ results are summarized by Mllingsley [10], who gives extensive 


- Dusting the past two decades Savage's interpretation of the work a 
de F: rinottd on subjective probability has renewed interest in Bayesien 


ae theory. Contributions in this area have been made by many 
- ve searchers, Lnoluding Yon Neumann, Wald, Blackwell, and Girshick, Leading 


tm the current work of Raiffa and Schleifer [33], which, to a large dezrec, 
aa ;Y. 


inedt igePet kv 1G 


Teluall .A «A Wl Seonubortat sow nfado votes" as Yo Ire —_ re 


 ferexy san Jvotd + mo ouores hl ond omit dod? cond bo a 


arorelr ad? az’ VOLOF} ‘O21 ww atTD Ets fiz) Lh sovaAt evomabewt wid wt ae i 


g - - ; . a | = 
te -tedeans esinfint aa diiv antado of ‘yicertt Lap htpomdiam add abs 


ree s 
aboot mas moltudteineo $nsdroqet ebse doo bus Lfciood 
ot. hare veorrs!! 4 5 YtcS . ot to atasa 8 Jrcern ofl . 


j , 


ike a8 


Ivfteas » af abide veokrs!| etd edt boxbgoset Liew amv 4, £ overt 


r : ‘ m , , ic awe ' 
‘De qedzun atinsorot! to few eoxaeoord | iq ‘te obs ition o te 
a : 


‘ 


~~ 


Pees eneiciorw of eban weed evar! wrosdt Coo ftamedgan oi!) To ane 
etiereowet em mero » ewricfokt .wrielmeto ,coteudd es abled? 

, . > ® * * oe | sé ‘. 7 

ih? Dane) x ied 7, 7 bersreas viLonerteyn ef JL enotjop tics © 





on Pee 

presents a unified thesry of statistical decisions which is suitable for 

applications. 
Recent research at the l‘assachusetts Institute of Technolory [13, 14, 

38} has been directed teward the application of Bayesian decision theory 
to various models based on Markov chains with uncertain transition 
probabilities. These efforts have demonstrated both the feasibility of 
| _ such decision models and the need for a more thorough investigation of the 
_ underlying mathematical theory. ‘The present work attempts to establish a 
“a tical basis for sone decision models which involve a finite larkoy 

chain with uncertein transition probabilities; particular attention is 
ive te sequential decision models. While we have dealt, for the mst 
+t, with matters of existence and convergence, the question of rmmerical 
putation has not been neglected. There are, however, many problems 
j of mertont computation in this area which are yet to be solved. 
= In 1953, Le S. Shapley (36]» using # gans-theoretie formlation, 

















oa 


with alternative transition probabi ities, which were assumed to be known. 


ae Sliver [30] has Investigated various questions in a Markov chain 
‘with uncertain transition probabilities and rewards. In partioulas, ho 
thas treated the problen of a natural conjugate distribution for the 








-3 
data=rensrating process of a liarkev chain and has attempted to find the 
expected values of certain functions of the transition probabllities, such 
as the steadyestate probability veetor. These results assumed a specific 
prior distribution fer the transition probabilities, a generalisation of 
the beta distribution which we shell call the matrix beta distribution. 
> Many of Siiver's results are generalised in the present work. | 
| Cozzelins [13] has examined a sequential decision model involving 2 
tweestate chain with uncertain transition probabilities. In a related 
ade Cosgszeline, Gonzalee=Zubieta, and Miller [14] have developed 
- Rourlotte retheds for treating sequential decisions in a larkev chain with 
uncertain transition probabilities. Their firdings ars based on Honte 
i Carlo studies. 
:: al The results of the present stady are obtained under the assumption 
_ the prior distribution fimetion of the matrix of transition 
robe ties Delomre te a family of distributions whieh is elosed under 
secutive sampling. ‘This concept is formally defined in Chapter 2, 
| gone properties of such families of distributions are derived. 


















" Bt In Chapter 3 we eonsider a discounted si ind control mdel in 

vt ch alternative transition probabilities in a Markov chain are sampled 

ov sr an infinite tine peried. The problem of choosing a sequence of policies 
: whieh naxiniges the expected discounted reward over an infinite period 
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vol bo 
is fermuilnated in terms of a set of funetional eqations. It is shown 
that these oquations have a unique solution ani a mothed ef successive 
apvrexinations which converges monotonicslly te this solution is considored. 
Certain funetions of tho transition probabilities, such as the nestep 
transition probabilitios, the steadyestate probabilities, the discounted 
- total reward, and the gain, are troated in Chapter 4, where we obtain 
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_ pecursive equations for the means, variances, and Serene ef these 
quantities. An inmportant rosult of this chapter is a ates that, urder 
=a general conditions, the mean neste transition nrobablility natrix 
. “ap: roaches the matrix of nean stoadyestate probabilities as n-pes . 
7. ‘These results are applied in Chanter 5, where discounted and | 
“und.seourted terriinal control mdels are studied. In these models of a 
3 oA ehain with alternative transition probabilities tho decision= 
nak aker can sanple various alternatives by paying a sampling cost. After 
a Beets moon of information about the process 4s pained in this 
mer it becores profitable for hin to cease sampling and to choose a 
4cy under which the systen operates indefinitely. Those models are 
elated as functional equations and it 4s shown that, wlth probability 
one, a terminal decision point is reached under an optimal sampling 
br 2 ye Ve then show that there exists a unique solution te the 
7 onal equations and investigate a methed of successive approximations. 
“Tho results of the first etx chapters arv obtained for any ritcr 
: bution fnetion which belongs to a femily closed under consecutive 
sai In Chapters 608 we consider a specific distribution for the 
: transition probabilities whieh we eal the matrix beta distribution. This 


a 
“a 
ey 
oul 





















= a 


distribution is defined in Chapter 6 and its nain vroperties are derived. 
2 ‘Ve also introduce, in this chanter, the Whittle distribution and the 
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botaxthittle distribution. These probability distributions are utilized 
















in Chapter 7, where we de prier=sosterior and prenosterior analysis for 

@ Markev chain whieh is observed under the consecutive sampling Pe The 
transition count is identified as a sufficient statistic and is shown to 
_._ have the Uhittle distribution, conditional on a fixed value of the 

"i _ transition probabh lity matrix. The natural conjugate distribution for 

| i dataerencrating procoss is the matrix beta and the unconditional 


In Chapter 8 we consider the results of Chapters 2-6 in the case of 
Ava-ctate Markov chain when the mrior distribution of the transition 


_ The results of this study are summarised in Chanter 9 and areas for f 


future research are discussed. 


‘The matrix with generic elenent Pay is denoted 2 = {o 448 the row 
sr with peed elenont p, is written Re (Do cons Dyde The matrix — 
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1 | 
2 S 
txt = LE, Te (1.204) 
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_ Similarly, the M x N matrix P is a point in &,.. and has the nora 
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Wel = Cc a (4.2.2) 
an fei get 2S 


random quantities aro denoted ty the tildes thus, f, B, B,, are, 
respectively, a random matrix, a ranion vecter, and a random variable. 

Let h(P) be a sealar funetion of the Mx N matrix ~. Assume that 
each row of 24s subjoct to the constraint 
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= od tanger sa 
“If F(P) is a distribution funetion, the PLiommn-Stieltjes interral of h | 
is to ‘be interpreted as an i(tlel jefold iterated ever the independent 


@lonents of Ps 
) b= 


zi, | 4{s1, eves M (1.2.3) 
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— fncpar(e) = 
_. (1.20) 
| a? acai Ways e009 Pq Ned? Poa? ©6009 Pus eg UCP y gs ecog Figtie* 
® Ch, 4(B)] is a mtrixevalued function of Ey the nrg , 
of b is to be interpreted as the matrix of the integrals of each 
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nenaker can choose one mg i alternative transition vectors, Te 
& oe beets e009 ph de where mnt is the probabtli ty that the system nakoo 
a trancition to state jp given that it is currently in state 4 and the kth” 
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alternative is used. The vectors a are stachastic veotors; that is, 


Rp > 0» kel, ecesg fh (1.2.62) 
ij” dott Ly cook N 
ee 1 (1.260) 
pes B he ke ec0og K efs 
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om With each transition veetor, Dy is associated a reward vector, 
oe 

7" = (req eae ry) where os is the reward earned when the system 

on 6 s transition fron state 1 to state j under the kth alternative 
oe 

As ‘colt fata kel, e029 Ky 4,j=1, e2eg N)e 

4, pe Cee reagent tie ee Ras As i. 


er 
* 
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as a Ky 3 


; (i e2e%) 


AD . x panty: of the solecticn of one alternative ra Saba 


ane Bipey, bo ereeesed as a row vector I= (Czy soos ) Ty)o where Sq is 
th ip Ae lox of the alternative solected in state 1( 6; j= tp ooey Ky)» Tho 
he astic matrix which governs the transitions of te Markov chain uncer 

as Poti poldoy, % will be denoted ty P(S) or, 4f no confusion vill 

result, by P. The corresponding reward matrix under volicy JC is P(S”) 
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cof }eo 
or Re The set of all vessible policy vectors, J, is deroted © and is 
a finite set. 
The natrix © oan be regarded as the parameter of a Markov chain with 

alternatives; uncertainty about ¢ is oxpressed by recarding @ as a 
_—_randon natrix wth a prior distribution function, H(@|+), which has the 
parameter /. In goneral, / is a point in a miltidimensional Suelidean 
_ space. The range sot of & is the eiadas telkstcl-ccubeapas stochastic 
natrices, denoted ae 
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We renark that Ken is a closed and bounded, henee, compact, subset of i 


| Amel of a mitivariate dott bution function; in bee paises 


al | ee: an(@\t) a1. | (400097 


Bren 
ron HEI) can be obtained the marginal distributions of the tt be i 
as stochastic matrtees, [@). tho narginal distritution fonction | 
) 4s denoted F (EIT) or, when the dependence on oT is clear, sinply 
i” ‘The range set of P is Yi the set of all Nx N stochastic 
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CHAPTER 2 
| FAMELIES OF DISTRIDUTTOUS CLOSED 
UNDER SAMPLING 



















| ifueh of the diseussion presented in the following chapters is 

_ carried out under the assumption that m@Y)» the prior distri imtien 

of &, 1s a nenbor of a fontly of distributions closed under a given 
shamed rule. \Jo formally define this coneept in the present chapter 

| | Ptorive some properties of such closed fardilics of distributions 

a ch till be used in tho soquel. 

The notion of a fartly of distributions closed wider sampling 135 

of course, rot a new one. In Groat Tritain, G. A. Narnard [5] in 1954 

@ may nore recently, G. 3. Vetherill [40], have apnliod this concept te 

sapling inspection problems. In this country, R. Bellman [6] and Nelimm 

a nd. Kedlaba {8} have used the idea in connection with adaptive control 

os. A vartieuler class ef distributions clesed under sampling, 

nas naturel, soujucnte diatrlistionss forms the basis of recont 

weh by Taiffa and Schleifer [33] in statistical decision theory. 

‘the properties of closed families of distributions which are derived 

in this chapter and their application to decision problems in a l'exkov 

she n with altematives aro oripinal with the present vorite 

?o1 Hauilics ef Distrilutions Ciesod Under a Jenvlinc lule. 

“Consider a sequence of transitions within a tarkov chain with 

é Aten atives. A ganoline mile is a set of snecifleations witch determine 

the following: | 








@i0= 
& The distribution ef the initial state of the chain and 
the initial policy under which the process is opsrated. 
be The transitions at whieh policy changes occur. These 
transitions may be determined probabilistically. 
























@. The distribution of the naz poliey when 4 policy chance 
oceurse This distribution is a probability mass fimction over 
the set of policies, ©, and allows for randomized selestion of 
policies. 

om d. The transitions at which tho stato of tho precess is nade 

inom to the decision-nskor, ‘Those transitions may be determined 
probabilistically and, when they do occurs an gbssryation ef the 
process is said te have taken fines: Thus, an observation of the 
process is a rarxiom variable whose range is the sct of state 

—Anddoess f1) coer ny. 

@ A rule for termination of sampling. 


We adept the convention that, 4f a policy change or an observation 
, rs at tho nth transition, it takes place immediately after the nth 
transition has occurred. 
There are te sanpling rules witieh eT" particular irmortance in 
eoocling chapters, consocutive sanpling and vestep sampling. 
A gonsecutive samnlinge mile of size n is characterized as follows. 
sped’ fie initial state and initial policy are neliotian with probatd li ty 
- A total of n transitions are to occur, with n selected in advance. 
ae! transition is observed. Polley changes, if they oceur, take place 
at predetermined transitions and, at each changes a predetermined policy 
As chosen with probability ene. ‘Tiusy a conseoutive sampling rule of 
size n consists of n consecutive ebservations ef the states of a arkov 


we 
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atts 
chain with alternatives under a sequences of policies whieh 4s seleeted 
in advance of sampling. 
A yesten sampling rule ef size m may be deseribed as fellowse A 

positive integers ny a sequence of n positive integers, {Ys aeeg vi 9 

and a sequence of n policies, {= eaeg of » are selected in advance 
of sampling. We allow the possitility that sone or all of the % are 
equal... A specific initial state is chosen with probability one and a 
sequence of vw, transitions are allowed te oceur under the policy J 
‘Tho state of the llarkov chain is observed after the v,th transition. 
Then Yp transitions oceur under policy » the state being observed after 
the pth trenton, and so on. Pre nae ee 
ee eer The vestep sampling rule will be used in one of the temimi 
control models of Chapter 5. 
3 Ve now proceed with the definition of a fanily of distributions 
el te under a sarmling rule. A collection, $-, of probability distribution 
etiona is said to be a family of distributions indexed le JK Sf 011 
abers of the collection have the sane fimetional form ami differ only 
in the values assigned to the paranetor 1. Tho sot of values witch 1 
assuric is denoted ¥, termed the aduissable pexancter sate ‘The 
seable parameter sot is asauned to be a conmocted subset of 
(r sLbiy multidinensional) Suclidean space 

Let 8 sampling rule bo specified and assume that a sammie of n 
ervatilonss &, = (ms sep X,)o has resulted under thet sampling rule. 
te ty 2 (4,10) the Likelihood of the sanple x, under the given sanaling 
given that Gu P. Let the pelor distribution function of @ be 
| )p a nenbor of SW, a farily of distributions indexed ty . ‘Then, 
af anc? \) 4s the prier probabliity that g lies in an infirditesinal 
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neighborhood of ¢ , tho Testerlor distribution funetion of Y is 


ate 
—— 


nC P| Ys %)o defined by means of Bayes’ Theoren: 


anc?| V, x.) = L&I ana). (2.401) 

gf Pe ane ie ) 

a rf nC a x, Jeet for all 4% and all sanples %, of norezero 
probability, thon 9+ is said to be closed with respect to the sampling 

rule which determines _2 (fz, 1¢ ). In this caso the postorilor distriinuison 

As denoted u(@| 4"), whore | 

a yee t+), (2.1.2) 

“ere T 4s the mpping of EF into E Antueod by the transferention 

(20464) when 9+ 4s closed under the given sampling rule. 

In the special cease where the gariple consists ef & singlo transiticn 

rom state i te posed j wnder the Icth alternative in state 4, %" will 
















| yea TY"). (261.3) 
If a fixed policy = is in force, the superseript k = Tj my be 
suppressed in (2.1.3). | 

In Section 2.3 fantlics of distributions which ere closed relative 
te the conseeutive sampling and yestep sampling rules arc discussed in 
41. In order to carry out this discussions sone properties of the 
4x beta distribution are required. Those properties aro sumarised 


Tho matrix beta density fmetion, definod by equation (2.24) beler, 
wi 4 be shown to be the naturel conjugate distribution for the Likelihood 
function of the consecutive sampling rule ani, henes, is of intrinsic 
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innoertance. loersever, as till be seen in Section 2.3, nary of the 

properties of arbitrary fariiies of distributions which are elesed relative 

te the consecutive sampling rule or the v»step sampling rule are related 

to sharacteri stiles of the matrix beta distribution. Yor these reasons, 

the principal facts about this distribution are summarized in this section 
Without proof. Complete derivations are given in Chapter 6. 

Tho K x N randon gneraliaed stochastic natriz, = = [By yle ts said 

to have tho matrix bota distribution wlth parancter WH = [nif] af e 
has the joint density fumetion 











GN) eee ee ee 
F Fug” (QM) = eH) pu Te ays » Ted. 
2. zs 0. j  @lsevhere (2.2.4) 
; ‘Tho nermalising constants CTH) » 4s piven by 
te(Mn) = Bi ri ea state j (2.2.2) 
k 
I P(e) 
" = : ny © kal, @ee5 = (2.203) 
*“y ga $l a | 
-paramster ™ 4g ak x N matric such that 
ie 
>0. Kathy oo (2.2.4) 
) "43 ea 
It is shown oun in Chapter 6 that 
A rt (Gimae =1. | (2.228) 


' For et, eseg 4 and 1,je1, oon N, the means and variances of the 
elonento of & are given ty the formas 


atts) = _ = Bi | (2.2.6) | 
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@“k 
ap: Yc Fe (2.2.7) 
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Final 
The covariances of the slenents of © are 
| nk a 


Se eet es eee 

 Geailg oaee Hh 

ae f#toraxgy (2.2.8) 
Let &, = (xy» Xp evey X,) be a sanple of n trancitions observed under 

the consemutive sampling rule, uhere x, 4s the initial state, known 4n 
| advance of sampling. Let ff, denote the mmber of transitions in x from 
stato 4 to state 4 ay kth alternative in state 4 (lml, coe, Kis 
Ande coco 4) and define the Ssanaktdan samt of the ample as the 
Kx motets ge Oo gle Then tho conditional probability, eiven that 


/ . - 2° oer ae 


var 


aj 
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jel gmt Uy (oh) (2.2.9) 


et ‘tho rule by whieh the sample size n was selected is neninformnative in 
the sense of Raiffa and Schlaifor [33], thon (2.2.9) 4s tho likelihood of 
‘the sara %- Itis clear that FAs a sufficient statistic for this data- 

nerating process and that the natural conjugate distribution is the 
natrix beta distribution. 


-‘Paxenster 7° and suppose that a sample with transition comt F is observed 
under the consooutive sampling rule with noninfornative stepping. Then 








1, 50 
the posterior distribution of g is natrix beta with paraneter 
Ma" ws m° + Pe (2.2.10) 
Exaag- fy Bayes’ Theorem the posterler distribution, DC 7177 %» F)» 
is proportional to the product of the kernel, ef the Iikelihecd function 
and the kernel of the prior distribution, 
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The richt side of (2.2.11) 4s the kernel of a matrix beta distei bution 
with paraneter m™m°* + Fe QeBeMe 


- Sompdlary 202.2 The family of matrix beta distributions is slesed 
with respect to tho consocutive sampling rule. 
fsnek. The corollary follows directly from Theorgn 2.2.1. QeEel. 


243 Eaubliss of Distrilmlons Glased Under the Consemutive Sampling Bile 
In the following chapters wo shall confine our attention te medels 

based on either the consooutive sampling rule or the v-step sampling 

‘gule. Sono properties of farilies of distributions which are closed 
| under oi.ther of these rules are established in this seetion. Specifically, 
At 4s chom thet thor are an unlimited mnber of distinct families of 
‘diatributions watch are closed under the consecutive sampling rule, thus 
allowing tho decision=nskor considerable latitude in selecting « prior 

: Aatribution for @ e A lemma of fundanontal importance for the develovnent 





is chow that the class of such farilies is identical with the class ct 
- fanilies consisting of probability mixtures of distributions fren a fer ly 
 Slesed under consecutive sampling. It then follows that any family of 
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distributions elesed uxier vestep sampling is also elesed unier eonsecutive 
sampling. Finally, 1t 4s vroven that, for an arbitrary orler distribution 
on ¢ » 1f mn observations of the iarkev chain are obtained under either 
sampling rule, then, with probability one, the probability mass of the 
vosterior distritution tends to concentrate at @ , the true state of 


nature, as n-> oo, 


?.30% Families Closed Under Consecutive Sampling. 

In Seetion °.? 1t was shown that the natural conjucate distribution 
for the eonsecutive sampling rule is the mtrix beta distribution. 
dttended naturel conjugate distributions for this sampling rule ean be 
constructed as follows. Let g(Plw) be a ronenegative Torel finction’ 
defined on a Kak whieh is vesitive ever some subset of v4 Katt" The 
parenetor @ 42 a point belonging te ©- , a subset of a Euclidean 
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> 0. kei, aoe K (2.43.1) 
"is dhol, coogtt 
Ve assune that ¢(¢@\w) is sufficiently wellebshaved that the integral 
fx on kK o4 
rT ne (ne) td a(@iw de a L/C(A yw) (7.302) 
a isl jel kel 
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exists for all @ ¢ 0 and all 7m which satisfy (2.3.1). Let 


mG led = cen »w) cys * el), @o4 
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The function h(¢\%j,@) is a nenenegative Morel fmetien such that 
n(liaw AP wi, (2.300) 


ie 


* see Lodve [28], pp. 106, ff., for a discussion of Forel functions. 4 
fumetion which is continuseus at all tut a finite number of points can bea 
shown to be a Merel function. 
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and is, therefore, a probability density function. 

Corresponding to any fimetion (P| a>) which satisfies the preceding 
requircments, wo define the extended natural conjugate family, hay. 
indexed by the ordered pair (Wi»@), as tho collection of probability density 

funetions, h(@[Tit.a)» defined by equation (7.3.3). The following thesres 
shows that Fe 4s closed under the consecutive sampling rule. 

Thearem 2.3.4 let He be a family of probability density functions, 
n(E >)» as defined by equation (2.3.3). If the prier distribution on 
g As nPl4n'sw*)e We and if a sample x, a (X59 eos xe with 
transition count Es (tf5)s 43 ebserved by consecutive sampling, then the 
posterior distribution of € as h(@| om’ + Fs a )eNr. Thus, dt-~ 4s 
Closed under the consecutive sampling rule. 

| Praok.. Tho posterior distribution of é, D ( B\ ns» %)0 ig 
proportional tc the preduct of the kernel of the Ukelihsed fimetion and 
the kernel of the prior density function, 


fi 


9 
i B Ky ke + 1 a8 
Dean's 0%» ae KN Me (of 9 alle),  (2e365) 


from whieh the theorem follows. Q.E.l. 


The waraneter. @) provides the decislommaker with edditional 


AZ 


flexibility in encoding his prior lmowleige about @. It is to be 
noted, however, that @ remains unchanged in the posterior distribution and 
ig, in that sense, a misanes parameter. An exanple of an oxtended natural 
conjugate distribution is vresented in Section 6.4. 
The next reault is of fundanental importance for the develoment of 
the suceseding chapters. Sone additional notation is required. Let 
Z= (719 oes Vey) be o point in the Duclidean snaco 5, and let I denote 


KB 
an interval in Ee 
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is {uly < ¥, < Bee CL Bi, coes xn) F) (7.305) 


where ay < B, (4 = 1, woeg KN)e Let Q be a partition of I inte a finite 
number of mutually exelusive and exhaustive intervals, Ij» sso, I. For 
q each I,» we TRE ee ele 

ri a = J (By = a, )» Vv BL, coog M (2.37) 
and let v= “YX {v(Z)} . Finally, let %, denote the event that 2 
transition occurs from state i to state j under the kth alternative in 
stato. ; 
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Lama 2.312 Let GIT )ed-, ao family of distributions closed under 
the consecutive sampling rule and let ae ) be any integrable finetien of 
¢ defined on i xen Then the following identity is valid: 


Sok, tq raugixy = cry f ac@renei afer»), (7.3.8) 
tC, ad é (oe v ¢ & 
kei» coos K 


i 
where BE,(+f) As the mryinal expectation of 7, ,- 
 ~Exaef. Let I be an interval in ©, which contains Pic y. For ary 
‘partition Qef I, lot @ = [a] denote an arbitrary point of 
Pn et and let ACY) = Pe and, lt) when € has the 


distribution function H(@|¥). Then 


; fos (@ dant inne. « ona) ( ACh) (7.309) 
_ Prssermain = me Fig Meyer Go. 


ASTEY) = EAS ah Nye 1 
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Rs Set, AS ye TIA) (2.5.10) 
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Pees IY] 
But 
z k dn(GlY) : 
PL, \ Fan fey? ] : | Pay ary (2.3.41) 
v8 xan 
and, by the mean value theorem, there is a point cae = Cm yl of 
IA, , such that | 
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ke ee. * 

i omy | Eat A A, eT] = (yyy - (203682) 
Sinee e, is an arbitrary point of WAL. ye we may set (Eady s (Pf 4), 
dn (2.3.9). Then, noting that Poke gh = OE CY)» equation (?,5+40) 
yields 

Bocce ; ow qs 
(Pgy), Ay) = BCH) A tp (1)) (2.3643) 
and equation (2.3.9) beeonmes 


= lim a | . 
a Jog, (2 aMe It) = FECT) mee Ee Ca Cor CD) 
Ko 


= Her) facqrawelecry. (2.3.00) 
Q.E.D. eyed 


e302 Ramliss Giese) Under uites Saubinge 
Let us new eonsider the likeliheed fimetion assoclated witha 2 


“estep sampling rule of size n. This sampling rule is described by the 


sequence of transition numbers, $4» coon VY 8 and by the sequence of 
policios, { i» coos Sa} e Let Eq 2 (Hyp jue %,) denote the resulting 


observations, where Ry is the mow initial state. Lotting me Cee ) 


denote the (ipj)th element of the matrix (p((C))"» the conditional 
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oe 
probability, given that F = C, of ebserving the sanoly Be is 


ait) (v9) ot ) Fs eo ) A 
Ponty (%) i xy Ep) + Tact x, =n Hh ny ofa) 4 (Sie (2.3.45) 


If the rule by which the sample size n ts ehosen in noninformtive, then 
. (253615) is the likelihced of the sample a 
ss Let’ H+ bo @ family of probability distributions indexed by Ye X . 
- ‘Por any fixed positive integer m let g = (a4, soon @) be @ stochastic 
vector. A probability mixture of distributions from 3 4s defined to be 
the waighted sum 
a — tee tys sooo Tap a) E, 0, GY,» (2.3016) 





















2 ore m @\ oF Jehy (Amity sooy m). It 4s clear from the definition that 
We Ys coop f,9 &) is also a probability distribution function for 
‘i . The gixed extension of } 18 defined to be Y.”, the family of 
probebllity mixtures of distributions fron as g ranges over Bo 

the set of medimensional stechastie vectors (for fixed m)», and as m 

rang ges over the positive interers. Since m@\y) is trivially a 

bai lity mixture, Wo 8 *, 

2 The following theorms establish that a family of distributions 1s . 
ek le poe under yestep senpling if and only if it is the mixed extension of 

a family closed under consecutive sampling and that such a mixed extension 


» clesed under the consecutive sampling rule. 


Shears Badad Let # bo a fankly of distributions closed under 

consocutive sampling and let }-" be its mixed extension. ‘Then He" as 

also 9 clesed under consecutive sanpling. ? 
 Emak. let x, denote a sample of size n obtained by consecutive 

ing. If the prior distritution of @ is 1(@\f%,» os f° ,2a°)e% 
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where ¢? = (4%, coos G*)y and if AIG) is the Ukeliheed function, 
then the posterler distribution is, using Deyes® Theorem and equation 
(2.3016), 
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2 F an, an @ \ f° ) (2.3217) 
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fed, oeeg 
(2.3.48) 
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and oT. is defined by biel CA 2). Sinee g" B ( rede e009 a” ae 
ig 8 stochastic vectors the posterlor distribution of ¢ is 
‘ii (al tgs e009 es & "Ye }*. QebeDe 


‘Theorem Zee Lot H” be a femily of protabllity distritutiens 
= ee ee ; * 
moi by Y cL. A nevessary and sufficient condition that W- be 
ed under tho vestep sampling rule is that 9%” be the riixed extension 


 Emof. Mrst assume that rel. Let xy jr) denote the observation 
of & transition fron i te j over a Satta e interval. of length » under 
the noliey = sd (Oye ©0209 Tye The Uileli hood function is 
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oon) =f : tae @ (2.3019) 
i.e eee 4h diy vighs 4045 ; 


whieh is the sun of nO) terms, each of which is the likeliheed function 


fer a sample sequence of length v observed under the consecutive sanpling 
rule. Lot 1'(@1t*)e 3° be the prior distritution of &. The 


differential form of the posterior distritution has the kernel 














4 an (@|t*, xX (vp x)jc : eee BR, yy) Py an (@ 1"). 
45 imi 4 oh thy wid 
Iceni (203220) 


wy is the mixed extension of 8 family clesed under «onsecutive 
sampling, then Theore: 2.3.3 and ‘equation (203020) imply that 

1 mey’s Xy 4(¥eS=) Je 3+", Moreover, 1f 9° As not tho mixed extension 
of a family of distributions closed under consecutive sampling, then 
My, oe Pd an'(@\1*) cannot be the kernel of a distribution in 

9 tor all vy & ant ¥°. Then, for some v, O, and T*, the 
terior distribution is a probability miacture of distributions, not 

. of which are in $4", and, therefore, the posterior distribution is 
not a uenber of }-'. Thus, we have established necessity and 

zu icieney for the case n = i, 

—_ n>i, the differential form of the posterior distribution of € 


al(@\t*, a) « It it 3) yay Sparen ty?) 


i Rens coat (el t's Keys Tyo (Pe3222) 


and the theoren follows by Snduetion. QeBeDe 
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Gomollary 205 If o° 4s a family of distributions closed under 
the vestep sampling rule, then 3." 4s also closed under the sonsecutive 
Eraef. The corollary follows immediately from Theorems 2.3.3 and 
2eBalbe QeEaDe 


20303 Larce Sample Theaxy. Let 1(@\4) be an arbitrary prior 
distribution fmetion of ie Ye now show that, if a sample of sige n 
4s observed unter either the consecutive or the step sempling rule, 
the probabil Mi. ty nass of the posterior distrLbution tends, ag n> » 
te concentrate at G), the true state of nature, with probability one, 
Thio statenent 4s nade precise in Theorems 2.3.3 and 2.3.96 Not only 
are these results ef interest on their owm merits, but an inportant 
application of Theorens 20308 ated 26309 wD be made in Chapter 5, where 
the question of tarnination of sampling is oe fer terminal 
control models. 

Consider & mapis of size n obtained under the vestep sampling 
a for a fixed state 4, a Mixed policy 9, and a fixed transition 
- Antervall vy wo shall say a txtal occurs whenever the system makes a 
Sad Fesrs beats 1th ‘aly SEES “Stats Stor’ W"tFensk tion Interval 
6 Of leneth v under the policy S. For fAsead state j, let there be 
assoaloted with the mth trial the random variable X. AC) which takes the 
Divaine 1 1¢ the cysten 40 next observed in state j ami the value sere 
_ethemies. A sample of size n this generates a sequence 
124(4)0 Ake x.(IY of independent, identically distributed random 
variahles whlch, if @ 1s the true state of nature, have the probability 
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funetion 
Pte (3) Ss 1) b=] ay) rar ad be (2.3.22a) 
Heee5 
PX (3) s 0] za 1 w aye cml »25e00 (2.3.22b) 
geeeg 
and expected value 
-* } BER, (3)] >] aye. 2 ka (203023) 
geesg : ; 


The following lemma is an immediate consequence of the strong law 
of large nunbors. 7 
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Fatma 2036 Let (X,(5)p eons X(J)) be an observation of sizen of 
the sequence of txials defined above, for fixed states 1 and j, a fixsd 
policy &, and a fixed transition interval v. If, as m-yco 4 state 4 is 
entered an infinite number of times and the poliay J and transition 


v are used infinitely often when in state i, we have, with 


aa Mn Jt. (vy 
m-»°er a GD) aK Cor Fly scegt (2.3.2) 
aay 
where Q is the truco state of nature. 
aA 


| Wo ronari that, if val and G, = ky Lew 2.3.6 applies to the 


ative sampling rule and oquation (2.3.24) becomes 
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— ~_ aa Xj) = ay FAL y coogi (203025) 
the Unit holding with probobiiity one. 


‘A generalized stochastic matrix, @ = Cy le is said te be nasitive 
Af all of its elements are positive, witieh implies that 





O< Py <1 ksi, eeog K (203276) 


Le JAly socgit! 


. ii 
Asi 
i>} 
‘ 





£ =a(0* (t) X38 


ve a 
es) -, ; : » 
: 7 ne a 
: +! Pp - 





<25- 


Jem, 20302 let x be @ semple of size n obtained under the vstep 
sampling rile. Assume that, as n>o, a fixed state 1 is observed 
infinitely often and that, when in btate i, the poligy SF end transition 
interval y are used infinitely often. Then, if the true state of mture, 
~Q@,isa posi. tiva matrix, every state J (fel, veo N) is, with probabl.ld. ty 
One, observed infinitely often. 

. Prong. For fixed states 4 and Jy the poliay <, and the transition 

+ interval V_ Lot {x, (395 bo the sequence of trials generated by the sanple 
a az defined ahove. Tho hypotheses oF Sher lames Amaly, tet ar zero Ate 

po and wo haves by Lemma 20365 


lin m , 
pieete ao XD = oa gal, ecogtt (2.3027) 














ns | 
with probability ene, Since aye) > 0 for Jel, coogls (203027) implies 
‘that, with probability one, X(j) = 1 infinitely often for each state j. 


This lemma can probably be proved under the weaker assumption that 
x e is Poa but 4% 4s sufficient for eur purposes to asgume that 


tim mv ity ‘conti tion, the set of non-positive natrices is a sot of 
2 BETO6 

Ve agnin rena thaty Wy tang » © 45 Lorem. 2.367 apples to sare 
ned under the consecutive sampling rule as well as under the v-step 
‘sat FUuLC. 

ca Let ¢ be an arbitrary positive nurber and define ¢ to be the K x ll 
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matrix cach element of witieh is c. For any K x N matrices @ and Q we 
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say that 
| lg na Q|<e (2.328) 


af 
k k 
Mi aaA ad 
Clearly, Af (2.3228) holds, then 


pe 8 
Is - QI at : cE; (Pry > G5) }* 


<e Jxu, (2.3.30) 
tnt th rom Ig - Ql » Gan be made artitrarily small by an appropriate 
cholo of Co Let 1(@\1) bo an arbitrary prier distribution function of 
' ¢ om assurie that a sarmile, &,? of size nis ebserved. Denote by 
mg t 2 &,) the posterior distribution of @ ant, for fixei Q, let 


4S € eo i=i, 2009 K. (2.3029) 
i,j=1, eee9 N 
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a P tt g| <2) fs Itz, (2.3031) 
‘ nate the napenine eee of the a 

> B= {@ IG @ = Qlegh Bren (203232) 


Whon 1 ‘we say that the posterior probability mass tends, as n-~>co» to 
coneentrate at Gl, the true stato of nature, with probability one, wo 
nean that, , Ler any E> 05 
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is Ql <j i, (2.3033) 
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ef €. Let 4, be a sample of size n obtained from a Markov chain with 


 +stratery iis such that, as n->°o, if state 1 is entered infinitely often, 


pee Lat 1(@|¥) be an arbitrary prior distribution function 








alFe 
every alternative in state 1 is sampled infinitely often (421, coc, N}e 
If Q» the true state of nature, is a positive matrix, then, for any <> 0, 


iia rf \€ - - Q| < gs} =1, (2.3.3) 
the Unit holding with probabiity one, provided H(@|+) asaigne post tive 
probabil ity to the set £ defined by equation (2.3.32). 
| ‘Brook. Let E(n) © (#5,(n)] bo the transition count of the semple 
The posterior crete of & ie AE] 1+g,)2 where 


Tt I it cof pts ancait 
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a. Th OT ek fasMangit 
aoe sha, ot Se et 43 
m (n) a choot t tf, A + 4) (2.3236) 


and a mitiplying the mmerator and i in of (2.3.35) by the 
me! a oer e(Iy(a)) defined ty equation (2.2.2), we have 
(KH) 
f. ( ) ance | +} 
; ait e| Ys) = fe (Gly agi) (2.3.37) 
"1 ms eXKoR) 9 \ gy (n)) antl 1) 


¥ ; 
: etgooe 2 


dencte the nmnber of tiees that alternative k ie used in state 4 An 2 
sample of sige n. Me n-700 at least one of the states of the chain is 
entered infinitely often. Lame 2.3.7 end the hypotheses of tho theoren 

imply that, with probatility one, every state is entered infinitely often. 

‘Tis, unter the assumed sapling strategy, v,“(n)» cas n->-o with 


i +t naam anne) Medio yYotinwin: belquce cf i argent ov SarcwasIoa. 4 
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disteitution £5" elma) 18 & (n) = CB (nd, whore 
ate £5 s(n) +i 
P, ,(n) = ree e a eoeg ; (223039) 
gn) +0 mal, 2.20 











q Ts, af Q@ cd Car s]s Lena 20305 implies that, with prebability one, 


joe. ae By gn) 2 ak > 0. ne eceg — (223040) 


‘tenis, with protability one, to concentrate at Q. rf & isa vanton 


mat 4x with the densi ty funetion 1) giana), the marginal variance 


ef b,, is 
i, BE (n) (1 BE (n)) 
ij ij 
> ] vy $B commence TD 
vu *(n) +Nei1 
& é ° (2.5.44) 
vn) ++i 
= ie eeeg K zy 
rm, 2) vee 
= yt 13° yt ® 0. Vth, cocg K (20 30482) 
ae Cc Lejaly covsy N 


in equation (2.3.32) ani Let 


Gee 


Wo now show thats as sn-scoy the probability mass of ie ? (G\y (n)) 


wl [€- Q]<ct\H ny] = Je mags (QVC) IAG 6 (263643) _ 
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| be Morgan’s law yields 
; | : | 
fal e-al <e} - U fot hy-ay eet 
where C denotes the set complenent. 


eet eg “9 < SJHmle 2 Pe BEE > etal Bet [eens 


(203 oli) 
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1. vardanee of FE, is 
ye Pe 2 (Kell) ey 
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2 = Gm =f) + ri) of, - hy) Sp eggs | 
(223046) 
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By equations (2.3.40) and (2.3.42), there exists an integer n such that, 
é for all n > n° ef 


2 be coos Ay 
with probehality one. Thus, for n> n°, thetnequality (2.3.44) becones 
a sale * |g - “al <e"(Min)] <0, n>n (2.3.50) 


Pt |&-9| a tt > 1 = 6, n>n- (2.3.52) 
with probability one. Since 6 is arbitrary, 

| noe PL [E- Ql <g* [Mil e2, (2.3652) 
th ‘Mmit holding with probability one. 


2 oS 


r evenn defining E as in (2.3.32) and letting E° be the conplenent ef 
Ba nm Pe, x ne have, from equation (2.3.37), 


| Rf \g- Q|< g)s =i an(@\ t. x) 
Bei MR MOAI eae 





the « somnaet set Eg’, Zquation (2.3.52) implies that “he, 6(m) = 0, with 
{ ir 5 babi 14 ty ONGe Thus, 


je oe, (a) + (BE, (n) = a, I< ow. hie coos Ky (20309) 














Fico 


i 5 celaytan ante it) 
j> SO (2.3.58) 
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and, with probability one, gee 
c|€- Ql <g 11. 
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QeEeDe 
—-s Magoo 222.9 Let N(G|+) be an artitrary peter distribution 
function of g. let g be a sample of aize n obtained from a Markov 
chain with alternatives under the v-etep sampling rule. Assume that, 
_vhen the ayeten te observed in state 4, tho eamiling rule 42 restztetel 
te policies froa Z, < = and te trensitéen intervals from the finite 
5 { 40 seney x » such that, as n->oo, Af state 4 is observed 
nfinitaly often, every policy in Z, end every transition interval in I, 
re too infinitely often (4 1 coop R)e ITF Gy the true state of 
46 @ positive matrix, then for amy ¢> 0, 
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Lty to the set B defined by equation Garon: 

. =a ist K, be the total WARE OE ORME S pairs, (9, w), where 
ci, ond vel, (45%, coop By and let K A K,» When in state 4, let 

k Ande the possibile policy ant transition interval. combinations, (Gp v). 


_ arsl vel, » Lot 


le (v) | 
ag * Pay (2 ri see sor 
and ating the K 2 8 matsise i ro Cris. Glearly, ii 4s a generalised 


“ato thastic matewix. If the index kk corresponds ts the pair (T., v), lot 














23a 
#4) be the number of times a transition occurred from state & te stato 
eae ie, saepleok: ever the transition interval v when the systen was 
cist ty tiw polar os Then the posterior distribution of @ 4 
M@| ts x %_), where 


N 
| ih oft i (ar xo) ») anelt) 
mn 4a4 joi 


aC | Te Z,) = n 
ni I i it rk, Mane tt) 
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where 2 (n) = [fi,(n) +1]. The proof of tho theoren from this point 
Ag Adentiosi to the proof of Theorem 2.3.8. Q.B.De 


_ We ramrk that the assumptions in Theorems 2.3.8 and 2.3.9 concerning 
tes which are used infinitely often are not restrictive. It will 
= Ly be possible, after a finite amount of sampling, to eliminate from 
further consideration those poli¢ies which are used only a finite nusber 
Examples of such elimination of policies by dosiinance argumcaits 
All be @ given in Chapter S. In any ease, the theoress apply to the 
margine! distribution of those altemative rows of & which are observed 
hs :., iia 


2.4 Some Genera? Pronarties of Glosad Rashes of Distetbuttan. 
Let YF be a family of distributions indexed by #2 E whieh is 


cloee under en arbitrary sempling rule. Some general properties of 
y are derived in this section. The symbol £ (x, \¢) will be used 
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throughout for the likeliheed of a sample of sise n, conditional on 
f = & , wader the given sampling rule. 


Theorem 2sei, Let & havo a discrete prior distritution, 


Pp c F = ¢,] sad bo Gye ai (2elo1) 
6 <p cog B 


where a, > 0, Eek For @ fined integer, mp let > be the 
feniily of all euch Userete distributions, indexed by g = (049 Goo oes %,,)« 
Then ¥ ny 4s closed under all sampling rules. 

Brook. let £(z,\€) be the likelihood function for an arbitrary 
| sampling rule. If g? 40 the petor atetritution of F, the postertor 


probability of @, is 
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a, > 0 and po a, gsi, g®a (24% Go" eoe5 ame. 


‘this theorem, while almost trivial, is of considerable importance 

for the solution of Bayesian decision problems in a Markov chain in 
practice. In many oases it may be feasible to place positive probatélity 
on only a finite set of points of x,y om to solve the iene 
* gerete problen, thus considerably simplifying the computations.” We 
shal 1 not enphasise this consideration any further sines most of our 
_ ms are stated in terms of Sticltjes integrals and, hence, sre 
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applicable te diserete, continuous, ami mixed prior distributions. 


Thearen 2.2 lot H = {HCY )| +eE} ve w family of distributions 

eloesed wider a given sampling rule and, for a fixed poliey CG, let 
a {Fe (2 1+)| tee , whore L'<E , be the corresponding 

farily of marginal distritutions of the N x N stochastic matrix (2). 
‘Tf the sampling rule ie such that, for ml, 25 000, it ie possible to 
observe & sample of size n under the fixed palicy T, and if the Ukeliheed 
aad sanple observed under the policy & dees not depen on elements of 
E mot an Ka, then 7, 4 also closed under the given sampling rule. 
 - Bek. Let _2(x \@) be the MkelMnood fmetion corresponding to 
the given sampling rule and let 2(x,|B) be the likelihood of the 
sample x, from tho Narkow chain governed by P(-). The hypotheses of tho 
‘theoren imply that at 
‘— | ALE\2) = Lee) (2.lb63) 
for oll. samples 3, fron the hain governed ty BUS). Let K, be the 
range set of the (K = N) x iy sage stochastic uatrix formed by 
é L ott 2 from cd all rows Bh such that k 2 0%  (4miy coop N)e Thom if 
Fo(P(Y*) te the nanginal prior dlstritution of L(x) and 1f the sample 
%, 49 observed, the posterior distribution of £2) te F, (Pits z,)> 
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meat ad ak. (PIY™) (2.4.43) 
for ell y's YT, me, “ye 42 defined by equation (2.1.2). Thne, 
FL (Blt Bde and Ts 40 closed under the semphing rule. 9.1.0. 


The next theoren deals with the contimity of the expectation 
gcry= fee aneiy) 

when rogarded as = function of ts where g(@ ) a 

of g. 

4 A distribution function, H(Glt}» te sald to be sontimane An at 

| 8 point Pedy y Af, for any ¢> 0, there exists a 6> 0 such that, for 

any fixed +, “imgl+) ~ we iyea| < whenever [it -t*|| <6. A 

“Point, Coed, _ Ss satd to be e gantimity maint of AIEIY) a¢ AG] t) 

fe a contimous fanction ef @ at @) for any fixed value of }. 
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ae its contirmity points, @. 


| Mieowes 2.4.3 Let 4 be a tunily of distribution functions index 
wy vet which fe contimous in + and let g(€ ) be any integrable 

4 netic of @ defined on a sot SHS), e Xf RY) ds a function 

of + defined by integral 
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Prof. let + be fixed and let jy} be any sequence of points of 
which converges to 1, where tet (net, 2, coode Let B(GIt,) 
be the corresponding sequence of distribution functions fron YW . Since 
YH 43 continuous in Ts 















E: Ha aC@lt,) = acelt (206) 
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at every continuity point of (I+) and, by the Helly-Brey Theorens 


a u a(f an(2 | 7) 2 f e(@ an(e lr). (20l407), 


Tims, for every sequence {+ ,} witch converges to +, 4K td act) 
and, therefore, &(+) is continuus at 1. 0.5.0. 


 Gomallary 2atat tot H © LalGly)| fe Eh be a Samtly of 

dist: bution funetions indexed by Y¢¥ which is continuous in + ani, 
for @ fixed policy 1, let OL. be the corresponiing family of marginel 
dietribution functions, Fy.(P|t). Then Ay de a family of distritutions 
contimous in + . 

Exch, In Theoren 2.1.3, lot g(€) = 1, Led, , andy for fined 
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7 i s i@ | Ge K N* . 4 Pag (i, f@i, e069 nt. (2.8.8) 
reigitie f axgity. (2st) 
and F , (Pit) 4s a contirmous function of TL at © for any Pe d,.< 


Jary 204.5 tot H = J Hel + eF be & faniy of 


* ox ep for example, Loave (23), Bde 1800182, 
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Cistriltions ‘ndemedjby 1. Suppose that, for every HPlr)e W . 0 
corresponding dencity function h(@\+) exists and that h(P('+) de a 
continuous finction of TY fer every Pe f Kon” Then WN 49 « family of 
 Aisteibutions continua in +. 
ss Brae. The corollary follows immediately from a wellelmown theoren 
of integral. caloulus which states that, if K(@I+) Ae continues in + > 
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CHAPTER 3 
| ADAPTIVE CONTROL PROBLEMS 


















Sel Discounted Prosesses. Rommilation- 
Consider s Markov chain with alternatives in wich the process, 
aasune: med to operate indefinitely, 4s gannled after each transition=--that 
4 - decision-maker knows the state of the process after cach transition. 
omation about e 48 gathered in this manner and the decision=maker 
may alter the current polisy at any time, as dictated by his eatate of 
knowledge about g e Such a process is en gdantive contro) menaena- 
It is aseuned that any sampling costs are inoluded in the transition 
ard matrix, a e fr, jie This implies that either the sampling costs 


Bie 

as 

7 
‘Pe 
ing 


; future rewards are discounted to a present value we shall spenk 
ounted adantive control nrecass. It is this clase of problens 


between tuo consecutive transitions 49 essuned to be constant and 
be taken as tho time unit. Let B be the present value of a unit revard 
,one unit of priere 4n the future (0<8<1). Sinee the present value 
he maxim possible ravard on the nth transitien in the future 

ecrasees as B", it 4s clear that the total discounted reward earned over 
an infinite perled umler any sequence of policies is finite. A natural 
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eviterion to use in choosing policies is, therefore, the expected total 
discounted reward over an infirdte peried and we shall define the 
discounted adaptive soutre) problem te be the problea of selecting a 
sequence of policies eo es to maximiac this quantity. 
= In the present section the discounted adaptive control preblesa is 
formulated in terns of a set of eimltancous functional equations. It ic 
shown in the following section that there exists a unique bounded sot of 
continuous solutions te these equations. In Section 3.3 a nothod of 
suecessive approgimations is described witch converges monoteriically 
and uniformly to this unique set of solutions and the question of policy 
conv spgenee is considered. The concept of recursive computation is 
then Antroduced and a numerical example 4s presented. The chapter 
conelixics with a discussion of the problems involved in treating 
undiseounted adaptive control processes in a Markov chain. 
A specific form of the discounted adaptive control problen—the 
twovarmed bandit problem--vas treated by Bellman [7] in 1955, using 

amie programing ani a beta prior distribution. The method was 

neraliized by Sellmen and Kalaba [8] and is summarised by Bellman in 

er 16 of Adaptive Contre), Zrocgsses [6]. Bellman’s methsd of 

tion is based upon the use of successive approximations. . 
a  Conzollino £33) applied Bellmen’s formulation of the tworarmed bandit 
oblen to the case of a tmmstate Markov chain with tim alternatives in 
ach state, assuming a matvix beta prior distribution. He mapped decision 
ns in the parancter space of the prior distribution fer the special 
@ of one urikzow trensition probability veeter. Cosselins, Gonzalez-= 
‘authota, and Miller [14] have recently suggested various heuristic 
treatments of the discounted adaptive control problem, basing their results 
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olin 
en similation studies. Fraimer [18, 19] has obtained a solution of the 
diacounted adaptive control probles in the ease of quadratic cost functions 
by redueing the stechastic fommilation to a deterministic one in terns of 
certainty equivalents. 
The functional equations forauleted in this chapter generalize ths 
results of these authors and, in spirit, follow Bellman’s derivation [6]. 
Gur contribution to the treatment of this problea consists of the follovin:: 
ae Proof of the existence of @ unique bounded set of contimous 


















eye 


"be Derivation of a method of successive approximations hich 

: s monotonically and unifermly to this unique set of solutions. 

z . Introduction of recureive computation techniques for the numerical 

solution of the discounted adaptive control problen. 

7 “Let the prior distrttutton of ©, H(€\1), be a meuber of a font ly, 

Wh, indexed ty Ye EF. The ordered pair, (1et)» whore dat, ..6y N and 
a ft, can be regarded ag the generalized state of the systen. Heres 

a! the phyoteal state of the systen and + sumarizes--or, more 

preciscly, indexes--the deciston-naker’s state of knowledge about 6. 


y refer to £ ac Andeing tho deci elonmsker"s state of 
5 as sompling progresses. . 

Be s00 denote the supremm of the expected discounted reward over 
an infinite period when the systen starts from the generalized state, 
TE RO Tae } ek, |e the discounted total revard under ary 
gempling strategy is tounded by 
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alga 
= ee (301.4) 
and, therefore, v,({) exists for del, ..0) Nand all te E, Xt will be 
shown at the conclusion of this section that wy (T) 4s attained under an 
 optinel sampling strategy and, hemes, can be regarded as the meximm 
| expected discounted reward when the system starts from (1,1). 
If, when in state (4, 1), 1¢ 49 decided te cheose the kth alternative 
ani the eyeten makes a trensition to state Jp the supremm of the posterior 
rc peated Ascounted reward 4e 


ve 
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Why + Bug(t 619). | (3.342) 











the probabtlity of the sample outesme J, unconditional with regard to the 
prior distribution of ¢, given that the system 4s in etate (4,1) and 
that. altemative i = im uses is 
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ke 
Be < Are J ey ALYY, (364.3) 


the marginal. pelor expectation of H,. Let 
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Kin) o sit ma ae 

Hr Z Che Sy (3e205) 
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9 the mean one-step transition reward when the aysten Ae in state 

) and olternative k 4s used. Then, regarding each v,(‘+) a2 a 

mor + defineton F (ie, 05 MN)» the supremm of tho 

ed expected reward when etarting from (4,1) mst satisfy the 

ing set of simultaneous functional equations, 

| r 
maz ‘ 

yh) = ek, Seer? +8 es F(t wee cron © (36405) 
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ssa he eitiathe: <tumcdand steminetnt hve ania meemaiee Mbacouliead 

reward over an infinite peried. In order to do this, 4t is necessary ts 
precisely define the notion of 2 sampigns stratecy fer an adaptive contro} 
proosas. 
7 Let the policies, o cL, be indemed by the integers 0 through Joi, 
9 3 is the mbar of elements in 5. Tus, Fe {zx 0 Lye vvre 3 : 
swppnee the apatan atarte from the generalised state ( Ane 1) and that 
_ k has been selested in state 34° We oan, before the first 
ts — decide which alternative to use in each state j for 
ee: ne second transition. ‘This consists of the choice of a policy, Ua Geo 
= Fa. & function with range 
ROpigessy dot} . 
° In general, bafere any transitions have oscurred, we can preseribe a 
ey to be used immadiately after the nth transition (nei, 2, 000) 
} nal (2, ° Ase coop A 4) ® & possible sample history of the first 
i transitions ard let pg 2 (ee FT Sage see Sa.,.9. be the sequence of 
c.02 oo under which the comple , comurred, together with I,» the 
gelicy under which the nth trensition will ceour. The poliqy history, 

Snot 9 18 determined ty evaluating the decision functions a,(4 gts 
a? 84)s coop Gy (Bap B23) at aoe x, = (2, 44)» corn Bin 

ig? tye ooo 4 ode Conditional on the Nartov chain having arrived 
te 4, with sauple history 2, and policy history gi» we moy 
rtecetars for use in state j after the nth transition for 























.. Sines ate of the selection of  polley, Ti, ei» end is denoted by 


Eira Snot) 2 Sno © fanotion with range [0, sos Set} . Since there 
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are N'"")) aserorent sample histories, %g° whieh start from a fixed 
state ins 4% is necessary to specify yited) values of the nthelevel 
decision function a (xis 33)° The specification of a complete sat 
of decision funetions, 4 (xa Spang de Lor wis 2) 3p oe and for all 
‘possible sample histeries, tegather with the choice of an initial 
alternative in state ip, constitutes e sampling atmacegy, d. let D, 
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rer iv, (aot x 4s a sequence of Simetions which converges monotenteally 
to v,(+) then 0, (it ryt ig a sequence of functions which converges 
monotonically te sero. Im this case, if ¢> 6 is an orrer-bound which 
4s acceptable to the decision-maker and if ren’ 4s the smallest positive 
_ integer such that 

















| loging )| <ep (3.3.0) 
then 9, (n °, 1) is an acceptable approximation to v(t) and the sampling 

strategy resulting in v,(n" 2) 4s an asceptadle epprezimation ts the 

% n° levels of an optimal saupling strategy. 

Et de not necessary to require that the successive approximants 

¥y(n, 1) converge monotonically to v,(‘+) and, in facts & non-monotonic 

7 Svq (me) may converge more rapidly than a monotone sequence. 

Theoren 3.3.3 provides a bound for @(n¢) assuming nothing about 

monotonicity. A lemma is first required. 


sma, 2.3.2 Let r and & be defined by omation (3.3.5). Then 
%y(+)» the solution of (3.1.5), has the bounds 
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. | 128 Pe<ylrr< < ts - geil, ooeg N (3.3.42) 
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O<P< 4 

77 Prask The mean reward per transition under any policy has the 


r<O(yye n fon, cece Ky (303022) 
fet, cong 
a2 
since the expected discounted reward ever an infinite period under any 
strategy is the sus of the expected rewards at each tranaition, the 
reserd of the nth traneition being weighted by 2", tho maxim total 


Seward over all strategies hss the bounds 
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Sheaves 36323 Let $v, (ne tt be a sequence ef sucsesaive 
eximations defined by equations (3.2.1) and (3.202), with constant 
rina) rormnd funations, 
WiC) a Vye oe ¢-* eI (30304) 


Let 0» v, Pp amd R be defined ty (3.3.2) and (323.3). Then the error 
o! the nth approximant has the bound 


Jeginy + >| < 6” Cras a-e ve met (3-3045) 
= evog 
Poe s 2s e¢e0¢e 
O<B<3 


Emak. The pwofis iminetive. For n=), 
@, (0,1) = v(t) =v» 


Suppose w(t) = Vge Then, by Learna. 363025 
iy vf , 

| a Co04>| S v4) = Vg <= tse Vo oa 8 (363026) 
if, on the other hand, wf) < V5 Lous 3.3.2 implies 
Jao, 2] = " oult)e v= te tet, cone 8 (3.3037) 
fn either case, 

iis = 3 rf bags a ° sea 03083) 
Jes >| < Ax iz Te V 5 a eH (303083) 
Tt 4e to be noted that at least one of the to terms, ize v ant 


v- 3 » ig nonenegative, for, if not, we have the contradiction 
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& 2k 3 & 
v< “E55 <qg<v <v. 
Heving established that equation (3.3.35) 4s valid for m=O, assime it 
helds for n. Let 
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a YY) - 3 (vamp t) < @, (neds t ) S Sy (voor oY) - 8, (vant). (3.3.89) 
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tom faige v's v= ae}. 
(3.3020) 


RY, Fe de'k Let the terminal functions V, (4) be constants, 


ai) AS @) s V,° Aml, coop N (353.28) 
ret 
be | ct 
Veohy £m 3 (303222) 
he error of the nth approximant hae the bounds 
0 <a(me+) < 8% qiby = v). : (303-23) 
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Exranf. If equation (3.3.22) is saticfied Theorem 3.3.1 implies that 
elt t) > 0. Moreover, (3.3022) implies that v (1ef) < Rand V(%e8) < F, 
ence, tat Ggeov > v= gig. aqastion (3.3.15) then Yields the 

upper inequality of (3.3.23). similarly, 1¢ equation (3.3.20) 1s 
‘sathatied, then V (ieB) = p and v'(eG) > Ry hence V = ia > ef ev 
‘The tounds of (3.3.25) then follow fron Theoren 3.3.1 and cquation 
(3.3035). QelaeDe 


Let there correspond te the nth appromimant Vy (np Ts defined by 
(3.2.1) and (3.2.2), the nestep optimal sampling strategy d'(n). At 
least one such optinal strategy exists since there are a finite mmber 
of different sampling strategies for the nestep probleag there nay be 
nore . than one nestep optimal strategy. Tho next theoren denonstrates 
that Q&S N->co » any nestep optimal stratesy sonverges to an optimal 
eempling strategy for the edaptive control model of equation (3.1.5). 

We mast first precisely define what is meant by convergence of a 

—_ me ne strategy. 

; * Let tho gonerelized state (1,1) bo fixed. To avery negtep 
sompling strategy d(n) there corresponds an ordered pair (iy), where 
ec ft, woes Myt and ycf0,2] Se defined by equation (3.1.23) sith 

| (Kel) ret dt = (k', y*) be a sampling strategy for 
iia 


H20o 


, = Ofora> # a 
“al iol 


the infinite herigon model of (3.1.5). Then wo say d(n) = as 
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a {i 
if, given ¢ > 0, there is a positive inteser v such that, for ali n> »v, 
i, & k* and ly, = y°| < ¢. ‘This definition implies that, given an 
artitrerily large positive integer “4, there exists an integer v such 
that, for all n> v, the values of the decision funstions on the first 
A levels of a(n) are equal to the values of the decision fimetions on 
the first levels of d?. 
















Sheoren 353,45 Let the generaliced state (4,1) be fined and let 
pa Dy denote the set of optimal sampling strategies for the adaptive 
OL problem of equation (3.4.5). ifd ° tn) 45 an nestep optima? 

g eteategy for the problem defined by (3.2.1) and (3.2.2), than 


mam a(n) ag ) (3.3.26) 


cote amd de A * 
Proof. let & denete the irdtial alternative selested in the 
nestep optizal sampling strategy din) and Ret X, be the set of 
Siternatives in state 4 which are initial selections for an optimal 
) me wLSy an A e We first show that £300 % B ely © 
—_ Using the notation of equations (3-2-3) arci (3.20%), 
| vy(aet) = SM vy net, 1), (3.3.27) 
and, gor any we, and a¢hy » 
vy(Y) 2 Blvsoet)> Bvgooek). (503.28) 
Assume that {4,5 dees not converge to 4 member of K, = Then there 
Kn, ¢ Kyo VEbe 25 woe = (30329) 
> ¢ be chogen such that 
co 
O<ecd) d rin ‘eq (tb) = Hlvpaoot)[\ F (303290) 
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‘Since ae V, (ays f) ay, yh ts WO haves by (3.302735 
|v w(t) «ss gt (v5 ny oh, t} | ar ( 3.032%) 
- fow all v oufficiently large. ky using (3.2.3) amd (3.2el),» 















ord there arises an integer v such that 





|sS°%9, ayods ity <a Sula, »t) < 3 (3.3.93) 
‘Tras, combining (3.3.31) and (3.3.33), there exists an integer v such 


— that I, ¢ Ky and 
a? | %CT) 2 Sey oe +)| < ey (33034) 
ontradioting (3.3.30). It follow tet ME k = & oxtots and that 


Given 8 positive integer 4, the aame proof ean be applied to each 
selection of an alternative in the first “ levels of the sampling 
strategy d°(n). Sinoe, having fixed / , there are a finite mmber of 
‘such alternatives, thers exists a positive integer v such that for all 
n y» the decision fmetions in the first « levels of d'(n) have tho 
game values as the corresponting dosision fanetions in some strategy 
de A. 5.0. 


ss Equations (32202) ami (3.2.2) ave tact ef a class of recursive 
equations which appear throughout thie report. it 4s to be noted that, 
although these equations resemble a classical iterative forma for 
‘snceessive approsimations, v,(n»‘t) 4s computed, not in terns of 
Vglrnt, 1), but in terns of vy(nml, TE(1))- Computation of v, (mp +) 
_ for 6 specific value of (4,1) involves the evaluation of betwoon 
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5g 
(ie ary and ten)? terminal values V4"), whore ky © "37% i} 
andk, = Sim} . 
Cne way to compte V, (tis 1) ia to start by evaluating and storing ali. 

required values of ver) S 7500 TDs then ts compute aml store ell 
Foquired values of v(t t)» using the remilts of the previous compatatien 
of (0, 1}. In general, for vt fy 2, coop mei, walv9t) 4s computed 

4n tems of a grid of volues of vglveds T) and is stored for use at the 
next stage in commting walvrde 1)- 
_—s- Siinee the number of temzinal values Vl) whieh are needed grows 
exponentially with np 44 ic clear that considerable storage capaclty is 
“required. For even moderately large n, taps or dise storage mist be 
Woe. Noreovec, 0 fairly complex indoring routine mict bo programed in 


















An eaeradiee approach is to evaluate v, (nm, {) recursively. Using 
this method, computation starts with the nth level enther than the gerc=%!) 
Rite to evaluate valves +t) for some pair (4,4). This level of 
eonpatation is suspended when a value at the vth level, Valu» 4"), ic 
required. Certain key portions of the (vt2)th level of computation aro 
‘Stored on a pustedown list ari the routine then calls itself, entering the 
th ievel of conpatation to evaluate Val» Y*). Recursion fe halted o¢ 
‘the gero-th ievel when vty) 4s comated. The results of lower level 
computations are then fed back, in succession, te higher levels. Having 
obtained the value of v Valve '*) in this manmer, the (vti)th Level. of 
conpatation reclaims ite partially completed caleulataons from the pughe 

_ dom Hist and completes then. This suacegsion of events contimes until. 

- vyleny Y) 18 evaluated. 
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The storage requirements for recursive ealeulation of ¥, (ns t) consist 
only of the space needed for storage of intermediate ecommmtations on the 
 +=fishedoom Met and, therefore, increase Linearly with mn. Thus, the 
recursive nethed hes the advantage of requiring considerably less storage 
than the first method deseribed. Since specific values of v4 ve ), for 
VS Os 1p coos mA, may have te bo recaloeulated many t&mes 4n the 
lative nsttod, we are coventinily trading conning tine for aterags. it 
ghowid be noted, however, that 2f the Mret methed requires tape handing, 
the recaraive nothed may reduce everall summing tine. 
| “The general theory of recursive computation 4s deseribed by Motartiy 
‘(1 Progremiing languages of the ALCOL famtly [32] aro oapable of 
Seoursiive computations as are most list processing Jangungess it is 
| = to do recursive programing in PORTRARII(#]. The recursive 
rograns whieh wore written for this report used the MAD language [3]. 
Uthlising the recureive method, a progran was written to evaluate 
‘equations (3.20%) and (362.2) for specific pairs (4,) when & nas the 
m™ twiz beta distribution. ‘Tiia program is contained 4n Appendix 2. 
Sone maneriical remilte obtained from the program are presented in the 
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Ageane that the prior distsibution of g 4s 9 mtrix beta distribution 
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with paranctar 
Mm = {0.0782 0.0370 (32502) 
a 4.0078 Os 
065526 3e 
0.2888 0.1132 
© = [0.667 0.333] - (3.5.3) 
06759 G0250 
0.125 075 
0625 0.375 
a p= Crys Tiss Bia Boe Pag So Pog? Feo) 
pee Be 
— netriz of this detritution ie 
(350%) 
9-20 -0-20 | 
Oo 
| or on 
9.080 «0.000 
=3,080 06.080 
6.029 0.020 
©) 0.0%) 6.020 
0.289 -(f.280 | 
= 160 Go300 | 
Let the Bseount factor be 8 = 0,2. 
‘fable 3.5.4 Wats values of 
| xfny) © aterm (3.505) 


a ¥, (20) = 0.000. toh, 25 (30506) 
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n | uln,@ ) | =") | a(n, ) A (n) 
0 0.000 mee 10.587 20.000 
0.000 eee 23.667 
i 8.002 bs 20585 & O08 
40 6999 2 2.6658 
2 | i008 4 0.875 9.800 
| £3,212 2 0.555 
3 10.418 4 0.099 0.160 
: 23.962 2 62205 
 & 10 499 i 6.038 0.03? 
- 13.646 ? 0.022 
5 | 10.5% t 0,003 9.006 
3 13.66% 2 0.003 
6 | 40.547 | i 0.000 | 9.004 
4 13.667 2 0.000 
6 20.2 | | UM) = 0.000 


0.080 
 Conmitatéion Mme: 5 mimites. 
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307 D 
307 


8.752 
12.789 


16.492 
13.262 


10 448 
13.592 


40.505 
13,652 


40.525 
13.665 


40.527 
13.667 


Computation times 


263. 


= (n) alr, MM) 

vie 6.767 
4 1.765 
2 4.918 
4 00325 
2 0.405 
4 0.069 
2 0.075 
4 0.012 
? 6.045 
a 0.002 
2 0.002 
2 9.080 


2 6.000 


0.130 


0.026 


0.005 


9.002 


Uy) = 3.720 
4 307 
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Table 203.2 
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Sines r 33> 6, the convergence ia mnotone inereasime. Tt is aeen that 
gonvergence to tm decimal places has occurred by the sixth iteration. 
The optimal initial poliay vector, 


o°n) = | O(n) (36507) 
Teln) 














4g recorded in the third column, whers ©} (n) ds the initial decision, 
“ie when the gysten atarts from state 4 and the m-step optimal eanpling 
strategy is d(n) > (i » S(n)). 

Using y(6,2) as tho Matting vector KH), the erver veotor of 
‘the nth iterate, a(n, 2), was computed ac 


a 20.587 | = ying) (30508 
alae) = eal ing 30500) 


4s displayed in the fourth column of Table 3.3.4. The last eslum of 
the table contains the error bound 


Ata) oe" ( me(ppev, Vie ah] 
ae ned by equation (3.3625). Note that the boumd, An), accurately 
predicte--in this ommplle-tho mmber of iterations required for tu-place 


Vi) 2 ie = 3675. Anika? (30509) 
The convergence is still monotone inoreasing and the error funetioanss 
© (,9])» aro reduced to approximately 2/3 of the correspanding values in 
fable 3.504. Five iterations are required 4n this etee for tuo=place 


in Table 3.5.3, the teminal fimetions sre the meximm expected 
Gleceimted resaris when the eystem is operated indefinitely under a sincle 
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alGo 
poliey (in thts ease, the polfay (1,2)3 of. Section 4.3), 
ican Fw) 
Convergence is monotenie efter the first iteration. The esrer vector is 
 gignificantly reduced ae compared with corresponding entries in Tables 
 - 305k ard 3.562, Four iterations aro neesssary to obtain toe-place 
gecurecy in this instance. 
























3-6 Undisoumted Beacesses. 
‘When the Gleonunt factor Ae uilty the oriterton of maximising the 
‘iL expected reward over an inflrite poried isa no lenaser useful sines, 
wa hh the poaaible exseption of & set of sample Mistories of measure sero, 
‘the expected reward ever an infinite period under any strategy diverges 
tot oo or «co, An alternstive oriterion 4s to maximise the expected 
rate at which the process carns rewards in the steady state, or the 
amected gain of the process. Bat thia criterion 4s not really preciso 
sin 29 the Rect gicnenaicoe CaM, in the adaptive control process, change 
alternatives in eny state at any time, and 4t is not certain thet a 
steady state will ever be reached. Morcover, among those atretegles whic’ 
do lead to a steady state and which maximise the gain, there are an 
sruitresily larzo nunber which are virtually eqiivelent-—thoos strategies 
4n widich each alternative 4a samiled a lsrge (but finite) mmber of tines, 
then o finsd policy 4s chosen under almost perfect information. Murther 
remarks on this class of strategies will be made in Seetion 5.5. 
Sines, for each Be(O,i)y there da a welledefined criterion leading 
to om optinal poldey, an alternative approach to the undisesmted process 
4s % let 81 4n the diecunted adoptive control problen as formulated 
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we shel). cali -” an optimal inktial polar for the undissountal adaptive 
control protien. The existence and nature of optine) poMcles as 

defined ty (3.6.1) are natters for future investigation. Hladiewell [24] 

and Desman{ 16) have used thie approach to unilecsumted decidion probiens 
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CHAPTER & 
EXPECTED STEADYeSTATE PROBAPLTLITIES 
AND RELATED QUANTITIES 
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Consider a Markov chain with alternatives which 1s eperated under a 
‘thxed policy, ©. Let FD denete the ' x N matrix of transition 

p robabi li ties, assumed to have the prior distribution Fg (2 \)- 

in this chapter we examine some funetions ef F which are of importance in 
dectsten problens, with partiqular attention devoted to the problen of 
— mting the means, variances, end covariances of these quantities. 
Section 4.1 deals with the nestep transition probabilities and with 
the expected discounted reward over n transition. The second section is 
erned with the steady-state probability vector. In Section 4.3 wo 
nsider the expected discounted reward over en infinite number of 

eg nel tions when a fixed policy, JU, is used and, in the final eection, 
sone results coneerring the expeeted reward per transition, or process gain, 


Ans ‘) a= {r, ale in most cases, the dependense ef various functions on 
All not be made explickt in order to simplify the notation to sexe 
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if P is a stochastic matrix governing the transitions in a Markov 












chain, then the probability that the systea is in state j efter n transitions, 
given that the system started in state 4, is the (4,§)th elesent of the 

nth power of P, snd 4e denoted pe when PB 4s a random natrix 75) 40 

@ vandom variable. In this section we derive expressions fer the expected 
value of Ove and the covariance of sig and oe and examine « related 
quantity, the expected discounted reward over n transitions. Silver » using 
di ferent methods, has eonsidered the expeeted value of Bs assuming © 
nated bota prior distribution for ®, and has presented mmerical results 
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where 8 = Cx 1 is the ramrd matrix. 
Emof- or nl, 2, +. and all Pedus a” (e,p) satisfies the 
g rence equation, 
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fitch fs (4.1.90). since 4°(B, +) 40 4, (1) as defined ty equation 
(Fok olb) equation (4.1.9b) follows. QHD. 
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stn) ty “¥). In 6 sample of n observations let f,, be the muber of 

dons observed from state 1 te state j an] let F = (f, gis an 

1 2 N matrix, be the tranaition count of the sample. = . the 
bservatiion of the sample F as @ rerdon matrix and, given the initial 
tat ie the muber of transitions n, and the prior distribution of Be 
can find the distribution of Fs unconditional with regard tof Let 
=a, 1 be the noon of this moon tens senpling Aleteitation. Then 
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distvivation, whien is diseussed in Section 6.5. 


4.2 The steadsState Punbabl ii Vector. 

Let P be an engodie stechastioc matrix. Then there 4s associated with 
ga unique veotor of steadp-atate probabilities, T(t) & Ci14e eon Tide 
DT, a 7, (P) de the steady-state probability thet the aysten te in 
“(£2 2 2, 200 Ne The veotor Z eatiafies the following eyate: 
of similtancous equations, 
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. if is 6 randen matrix with an arbitrary distribution function, 
IY}, which satisfies a mild contimty coniition, we show below that 


At Ss mooningful. toe speak of tho random veoter Jf. 

hans chiefly concerned, in this section, with the expected value, 

TMYy)a CY). ceeg TT. Tt Ns of I. It 4c shown that this expectation 
dots and that 7 ee © TCH). We then esome that FE) H, 

ly of distributions close! under the consecutive sampling rule, and 


a fimotional. equation for 7Z(). Methods of successive epproximations 
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4.2.1 Eeistence of the Lonents af IL. 

Let us now consider comiitions on the prior distritation of P wtdch 
ance @ the existenes of the general Joint moment of the elements of 27, 
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where the ‘y are momegative integers. 
let 
® 
eS bi, - 52 | Pe n? o< Py 5 4 (Apgoty oe oand (4.2.43) 
be the set of all positive stochastic matrices and, for 0<a<1, dofine 
‘the set 
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dy as De # S(a) (4&205a) 
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a which includes all periodic axt miltiple-chain trancition matrices, 
ag well as those single-chain transition matrices which have transiont 
states, is contained within 4 ~ 2%. Tae Inport of Lema 4.2.1 Le 
that, provided F(B|/) 4s contdmmous on the teundery of J, we need 
“only consider transition mtriose in 4°. In tke case, with 


“probability one, If exists end, moreover, tr, > 0 (Jy voy H)e 
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and, gince ¢ La arthtrary, J, © sf 4s & set of measure sero, preving 
sufficlenay. 

fo demonstrate necessity 1¢t suffices to note timt, if there does 
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fmotion ie continous on E.2- Therefore, the family of matrix beta 
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used to obtain a methed of successive approximations for the numerical 
egiculation of the expected value of ¥,(P). Some examples are presented 
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4.3.4 Benected Value of V5). Let F have the prlor distribution 
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i Equation (4.3.5) has the same form as equation (3.4.5) and may bo 
interpreted as a disasunted adaptive contrel equation in whieh there is 
exactly one alternative in eash state. The results of Sections 3.2 and 3.3 
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4.4.1 Moon and Verhance of g(Z). Let the expected value of g(3) be 


ication (4.4.1) shows that e(F) 4g eae and bounded, henes, integrable 
on Z - Te PNY) 4s contimous on the toundary of then Lome 
et implies the existence of the integral (4.4.2). 
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“3g ef Hstributions continuous on the boundary of d which is closed 
‘an der consecutive sampling, then the expected gain, BY), 4s given by 
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oo tinuous fanetiens of 1 on ¥ | 
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&s the mean one-step transition reward. 
The bound (4.12) Ae conservatives tighter bounds require e tighter 
‘bound on aa ° mH) | than that privided by (li..11). This 4s 
8 problen for Suture investigation. 

The covariance between ¥, 8) ond v2) is given by equation (413.22). 
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4.8.3 as £0ohiea Chogpsa. We conelude this chapter with a theoren which 
relates g(t) to ¥ (T+ Some results from the theory of summation of 
divergent series are required and ave cumerized here without proof. 
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CHAPTER 5 
TERMINAL CONTROL PROBLEMS 














; In this chapter we coneider sequential sampling models of a Markov 
chelin with alternatives in which there 49 an explicit sampling cost. 
“Tis Teads to a dstinction between eempiling the process an using tho 
process. “When the process is gamnled the sequence of states ocaupled ty 
the Markov chain during the sampling perfied 48 nade knotm to the desision- 
‘Baker, who then usee this information to update bis prior distribution 
en €. During the saspling perled the prosess earns transition ramrda 
as specified by & and sampling costs are incurred. 
‘Oa the other hand, if the process 4¢ yaad over es perled of n 
transitions, 4¢ earns traneitien rewards, but the deciaton-ncker de 
Permitted to know only the initial state ani the Anal state of the sample 
eoqence. The only sample cost 4ncurred ie that for observing tho state 
of the ayoten after the nth transition. 
«Tt 40 reasonable to ampent that, after a finite amount of sampling, 
tho prior dictritutton of © wii be enfficlently tight that the bost 
‘course of action for the docislon-meker will be to cease sampling and 
operate the process unter come fixed teninal polley indefinitely. Renee, 
‘the modells of this chepter are called feminal, sontml models. In the 
following sections we show that such e terminal decision point cocurs sith 
probability one 4m on optimal esupling strategy. 

ferminal eontrol models are applicable, in general, to any Markov chain 
with alternatives in which rewards are earned independently of the deciaion- 
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naker"’s kmowledgs of the sequence of states occupied by the aysten am) in 
which 4¢ is possible for him te detemtine the state ef the aysten at any 
time, for a neresero cost. A apecific example of euch a process is a 
Merizov chain model of concumer brandeahtahing behavior, where a survay 
mast be made to determine the current state of the market. 

Guvo-antlon eoquentiial sexpling problens with independant Adenticaliy 
distritated observations have been examined from the Bayesian point of 
‘view ty Vothorill C40}. A aisiler problem uith Markowdepemdent observations 
was recently considere! ty Bast [9]. 

“In Sootion 3.1 wo cemtine lode. I, a diecounted temmim] control mdel 
in which the decision-maker mat sample at every trenaition of the process 
until a terminal decision peint ie reached. This modell 4s formilated as 
a set of functional equations and 1% 4s shom that a terminal desisien 
“point 4s reached with probity one in an optimal sampling strategy. 
I% 4s shown, in Seotion 5.2, that there aziste a unique solution to these 
emations ani a netted of suscsssive approximations 10 introdiced. This 
model. is generaliged in Section 5.3, where Model IE 4o introduced. fedel 
Ri dincutted ticaiiat’ dented’ wth ss dears Seubicainetan 
either semple or use the process until a temminal decision te mado. 
Approximate metheds of mating terminal decialons are Gisoussed in Section 
Belts Vodels of txtlacounted processes are intsoduosd Sin Section 5.5 and 
the chapter concludes with a tte? Glecisaton of set-up costs. 






















Se Desumted Emacsses. Lada, I. 

Consider a Markov chain with altematives which has the reward notsiz 
Rc (ry ,]. At oooh transition tho decloloumnaker oan either cample the 
process or cheose a tersiinal policy under which the aysten is to be 
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operated iniefinitely. Let oy > 0 be the cost of observing the system and 
finding it in state 4 (4p socp N) » The cost of any sampling stratecy 
4s, therefore, a random variable befere 4t is executed. Assuming that the 
interval between transitions is csnstant, wo may use this interval es the 
Unit of time. Let 6 be the present value of a unit reward received one 
unit of time in the future (0<8<1). We shall sesk a sampling strategy 
Which maximises the expested total Aiscounted reward over an infinite 
peried. 

ss When the dociston-maker chooses to cample we clearly have a case of 
consecutive sampling. Time, it is assumed that the prior distribution 
fanetionce © 40 Ale |y)cH » a family of distritations cloved 
under consemtive sampling. Ust (4,1) denote the generalised state of the 
 gystem (A=2, socg Nt ‘Ye EX) and let v(+) be the supremum of the 
expected total discounted reward over an infinite peried if the systen 
starts from the generalized state (4,1). in Theorem 5.2.1 2¢ wil be 
shown that an opting) sampling strategy exists and, therefore, that v, (‘1’) 
is the maximum expected discounted reward over an infinite pericd. 

‘Tf, when in state (4,1), 4¢ 49 desided to sample the kth alterastive 
and the cyeten then makes 4 transition to state j, the supraemm of the 
posterior expected reward is 


af, © Boy + 8¥g(% CE). ($0848) 
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| Model. I 4s eactly generalised to allow the sampling cost to be 6, .» 
the cost of observing a transition from state 4 to estate j under ¥ 
; the kth alternative 4n ee 4. In a = a (5.41.3) below ia 
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The probahti ty of the sample outcome J, uneanditdenal with respoct to the 
price distribation, given that the ayatea 4e in state (4, +) and alternative 
k 4s in uses is the prior empected value of Biss 
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a ({)} = por BCT) 2, 8 (5.4.3) 


Be the expected cost of eampling alternative k when in the state (4,'b). 
Then, 4f it is decided to sanplle the process on the next tranaition, the 
gupremm of the prior emeoted remrd is 
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Sappoaes, on the other hand, 4¢ te decided te cease sampling and to 
rate the preeess indefinktcly under the policy S. let 


Wyte = PTCA CEL) Ay sony H C5685) 
An ee 

b » the unconditional expoated discounted reward over on infinite period 

‘then the policy 5 ie used end the syeten starts fron (4, F)- Then, 4¢ 

‘itis decided to cease sampling, the maximn prier expected reward is 


ree Hee t)} 2 = =e H (508.6) 


The sesxiim existe in (5.1.6) aince 2 is a finite sot. 
Using equations (5.4.%) ant (5.1.6), wo have the following sot of 
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functional equations which mst be satisfied ty the 
a(t) S (v,( +). 2eag Visl T))s 
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I=%, See5 N {504-7} 
ef 
O<A<2 
. ie to bo noted goahptpangad ASe) e 28 teed in equation 


Shi been done to simplify to same axtent a necessarily comiicated notation. 
‘The meaning of the aynbol. v,({) will almye be clear from ite context. 

We nw chow thet an eptinal ssmpling ones, oot naking use of 

Be » definitions ani notation ef Section 3.4. 


eee Lot vO 9) be the expected total discounted revarl 
az Medel I when the ayaten starts in the generalised state (4,1) ond tho 
. sampling strategy deD, is used. Let 
mp $_.,. 
| Ae) cx] ded, Sacral e- rag N (502.8) 
‘Then there is o sampling strategy aed, sank thet 7 
w(t) ad wy (+ 5d"). 4s2, eaog (§.4.9) 
} ret 
Prong. Consider the adaptive eontrel problem of Section 3.1. Let 


Ly, be the ast of all possible sompling strategies, , in the adaptive 


 eentrel problen vhen the system starts from stete i. In Model Z, i” 
“ ) 
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ded, » 12 28 clear that d 4s a possible strategy in the adaptive eontrol 
probilens hence, D, D, (3m%9 seey BH). Suppose ded, « Then @ eithar 
preseribes a fixed poliay, S., for use om every transition nm > a for 
gone integer n, or ele not. In the first case, d fe clearly a posailbhe 
strategy for Model I, In the second ease, d 4a alse a possible atrategy 
for Medel I, sig., a strategy in which « tommfinal decision point ds 
‘never reached under sone or all possible sample histories. ‘Thus, D, < D, 
and, therefore, Dys Dye The proof of Theoran 3.4.3 4s valid for en 
evtitrary reward structare, provided thet the reward por transition 4s 
bounded; time, the raninder of the proof of Theorem Solel follows tha 


proof of Theorem 3.3.4 ari will not be duplicated here. Q.5.D. 

















We now chow that, with probability one, a temminal decision point 4s 
‘feached An Modell I 4f an optimal compling stretegy is usel. Let Q 

denote the true state of natures Q 4s asouned to be positive, as 
defined by equation (2.3.26). 


Thearas Sele2 In Nedol Ty 4 the trus state of nature, @, So a 
positive matrix ond 4¢ tho termina), gmnetions ¥, (ct) are comtinagus 
Ain Y (27%, oocg Ne O65), thom, with probability one, a terminal decision 
point 4e reached an en optimal seupling strategy. 
ss Bepaf. fhe proof 4s by contraiiction. Assume there ia an eptime) 
sampling strategy in which a teruineal decieien is never male, Then the 
process ia canpled infinitely often under the conseaative sampling rule 
end at least one state, 4, 40 entered infinitely often. Sins at least 
one alternative, k, mast be used infinitely often when in otate 4, Lemma 
2.3¢7 arxi the poclitivity ef Q imply that every state ds entered infinitely 
often, ani, therefore, that at least one alternative in each atate is 
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stapled infinitely often. Since the system ia eperating unier an optimad 
strategy ony siternative whteh is conpled a finite nuxber of times is 
doulneted by ether alternatives after a Mnite number of transitions anc 
cam be oliminated from further consideration. Thus, wo may asemie, without 
does of generality, that all alternatives are sampled infinitely often. 
By Theoren 2.3.8, the ases ef the posterlor distribution of § temis, 
“with probability one, te eoneentrate at Q as n, the mmber of transitions, 
6 cos to infinity. That te, for amr ¢ > O, if Fr 4s defined by equation 
(2.9.31). 
ae | ae P c |¢.. Ql<e]=4, (§.1.20) 
> Minit holding with me a one, Let R(P\T*) bo the diotritation 
es ion which places the unit mase of probability on QQ. Then, with 
probability one, > ¥* a9 n-» co and equation (5.1.4) becsmes - 
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ss cy) ROX + i. ex ) tre - Bo, + iv cent * (5-4.44} 

ry i<k<{ <s J 

4 | fot 

Z 7 ani, seog 
feustion (5.1.14) 49 s sequentdal decision prebles in which the 
dest slownakor 13 cortsin ebout the transitlen probabilities and hes been 
studied by several authors. Rlackwell [14] has chown thet an optiue? 
strategy exists for (5.1.11) in which a wed poldoy, Giel, As ased 0% 
eve yy transition. Honeeseed £32] hes show thet the expected rewurd unter 
thie strategy is, in the notation of this proof, 
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Tho argument 49 valid even 4f the distribution whieh places the 
4% masa of probebdiity on @ is not a member of W . 
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V,(Se H~> V,(oa ¥*) with probadiity one. Let Hic, +*) = 
ae ry} e Since > O ($l, cocg NH) equation (4.3.4) implies 
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Therefore, with probability one, a terminal dedieien point 4s reached 
after a finite mmber of trunsitions. Q.E.D. 


(5-2 Gehatence sud Umiqumess af Jelniiens. Suceessive Aprommsations 
Let the sequence of vector functions, yin,) » vhere 
xaet) = (v, (ms T)s cess V,.(ne ')), be defined by the equations 





‘ max § ok ok ty wig 6 es 2 
v, (a +4...) 3 max taeen, |W) = Be, (+) + a BE cre coast cea 


=, (hen 
4=1, eeoeg B (5-2e1a) 


te 
O<B<1 
v, (0. )= a Sa ce.tr} : ee R (5e201b) 


Using equation ($2.4), 1¢ 4s show in this section that there exists 
®& unique bounded solution to equation (5.1.7). Equation (5.2.1) can then 
be used as a computational ts01 to appromimate this umique solution ar], 
“with this eppication in mind, a tourd on the errer, @,(n,+) = 

Wh) o ¥, (te 1). 4s derived. With the aid of this bound, we shew that 
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Lemma Save IR @ ie 4%, {5 and ¢ = rene then the 
functions ¥, (a 1) defined by (5.2.1) have the bound 


g 
| vq (ms ¥ | = eee e aat, eosg BN (5.2.2) 
THO 52s 2y one 
Ye ¥ 
O< B< i 


Proof. By equations (%-3.$) and (4.3.6), 


R(metd| <> <q5 9 oe aa B (5.23) 
a 28 
i amd, since ¢ > 0, 
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| vy (Os +)| Z Rs BC eo i=l, 2029 N (5.20%) 
1-8 Tet 


Assume that (5.2.2) holds for mn. Then 
a pt + 26 R 
— |vg(n + tp%)| < max (R° + BC+ “Ee ] 
S 
eter (552.5) 
QobeDe 


Theoren SeZ0k If ulm, ft) de defined ty equation (5.2.1), then 
f uy (ny 13 is @ mometene increasing sequence (4ml, se0y Ng te E ) and 
lim y(n, +) existe end is a solution to (5-4-7). 
Bronk. We chow iniuctively that {v,(n, 1} 4s a monotone increasing 
soquencs. Sines v(0,t) = Ta ST(xet)} wo have w (tot )> v4 (01). 
Assume that v, (np t) > tltmist ) for def, oop Hemd teL. if 

wy (aot) 2 Oe = ice, tt» then v,(neip}) > v(m). Suppose that, 
for eone ke a noe Kyte 
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v (net) © EY) = Bar) + B z BE (Dulas TCI» (52.6) 














Then, since : 
—aylntte 1) alt) = BBC) +B ERG Porylm Tle (50267) 
we have 
 wylaity ¥) = vy laytd> BZ EECHD fegtay HACE)) = valent, 1466092 
Va r) Vy We = gat 43 ¥4 Qe 4,3 my es 44 
> O» (502.8) 


‘proving the induction. By Lemma 5.2.1, the sequence eACY +i ie 
‘bounded, ROOD, ant ¥, (ms ¥) em. sts (222, eeng Ne That the Wanit ie 
a solution ef (5.467) 8 eee iy letting mvpoe Sn (5.2018) 0 QjeBeDe 


Sheoven 5.2.3 There 1a a unique bounded solution te equation (5.4.7). 

Proof. I% was shown in Theorem 5.2.3, that there 1s at least one 

bounded solution, gC)» to (5ele7). Acoue that w(t) = (w(t). vee (1) 

4s also a bounded solution. et 2 

SyCvonee Y) BCH) @ BACH) + BOE BACT IV CEE CHD. (5.269) 
fey Catt 

asoue (4,1) is fixed. There are four cased. 

eco ts He MEST ated e Pe hirer) - 

Thea v.04) 2 uy (4). 


Case 2. For same ae {4» eo0g Ete vy lt) a So (weer 9) and 
w(t) = BE SHco. 40]. than 


OLY (1) © wC1) S Blwp0 9) = Belem et)» (5o2e90) 
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MOR 2% 
cy jenn $F (re t)} » Then 


S\ (vat) « Sy (p25 'F) < v(t) eo mCt)s 0. (5.242) 


Case 4, For some iniiess a and b belonging to £2, eeeg gt 
w(t) = Bley, +) and u(t) @ SPlue t). ‘then 


Bye» 5 Tie Ih (t09'¥) € w,(1) ou (T) < Solve) od Silay 1)- 
(542642) 
Let k index the maxtmm of [sites +) = atu )| 
|spewaoe, ¥) 34 (54°79 ¥) ° | 5, (veer 9) - S5 (s909 | » ond 
| 8Xcvs0 ¥) SY (9005 1)| «Thon, dn all of the above eases, 
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Since wlt) ext wu, (4) are both bounled, thers exiote a mmber, H> 0, 
gich that 
|v, ¢ 1) « wu (+)| <M. @ 
Repeated application of (5.2.13) yields 
Jur) = ¥(r| <p. BO aS Paces (522032) 


‘ Sines O< B< i; (Se2. 24) implies wt) & wy, C4)» QoeDe 
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@y (rT) 4 LS 2) = ¥, (ms T)s (5.2.46) 
mh» coos N 
, DO sts2e+<0 
where v, (np) te defined ty (5-262) ard v, (1) te the unique bounded 
solution of (5.4.7). Let 8 and vr be defined by equation (3.3.3). Then 
©, (ap T) has the bounds 


0<ef(nV)< po Fer, (5.2.27) 
ie 8 41, ccog B 
= BIO Ly2yeee 
ocied 


Prat. By Theorem 5.2.2, {v, (re 72} 4s @ monotone Anereasing sequence 
with the limit v,('1)3 henae, a (nyt) > 0 (m05%,2,.0-). Tho remainder 
ef the inequality (5.2.27) is proved ty induotion. 

We first establish that SR. Since V, Y4 (Let) < R= 5 pe a5 
for all CcZ, wo have 
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ami (5.2.17) holds for ne 6. Assume the equation ie valid for n. ‘then, 
arguing as in the proof of Thearen 5.2.3, there is an index ke iho eves R, 
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GSorellory 5.2.5 If y(n, /) is defined by equation (5.2.1) and 
zt Y) 4s the unique bounded solution of equation (5.1.7), then 
fxm t—> v(t) unigermiy in bt. 

Emeg. Since the error bound (5.2.27) 4s indopexdent of ‘fy the 
corollary foliews from Theoren: SeZelhe QelbeDe 


Theoren 5.2.6 If the prior distrilation function of & 4s 
HCP (Ve H-» 0 Lemtly of distedtations continnous in Y which de closed 
under conscontive sampling, then x('{), the unique bounded solution of 
(501-7) 46 8 continuous femetion of T . 

Ereaf Since H As contimoua in 1, ¥, (0.1) © ma Vy See +i 
4s a continous fimotdon of {. Moreover P(t) 4a contimous. ins, 
by intaotion, ¥, (29 1) fe continucas for 2%, coop NH amd wi, ty2, 0000 
Sinee {vq (tm TH} w(t) univormly in Y, yw (h) As sontimious 
(400, soon N)e QeEeDe 
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in considering Model I it 4s imuediately apparent that the maxim: 
expected ravward will not be desreased-oand may be inereased=-if we stap 
eanpling 4n seme states while continuing to eample im others. For explo, 
Af the marginal prior dietritation of B is Ieese, whtle the marginal 
prior Gisteitution of the remaining (Ket) rows of F te tight, 1¢ my be 
profitehle to sample only when the ayeten 19 in state 4. Model IZ edalts 
this additional option. 

As in Section 5.4, let ¥ (1) bo the supremm of the expected total 
Gisgounted reward over en infinite peried when the system sterts from tho 
generalized state (4,1). tis assumed that the decision-usker cen 
sample the syaten, can use the systes over a poried of m transitions, or 
can make a terminal decision. If the aystem Le sampled the consecutive 
seupling rule is operative and if the system ie used the vestep sanpling 
role is aegis Time, wo shall assume that the prier disteiation 
funtion ef & © 40 HCl e He a fantiy of distributions closed wrler 
hac, smeting rule. Theorem 2.3. implies that $t 4s the mized 
extenaion of e family of dlateibatione which 49 elesed uniter the 
consecutive sampling rule aml, therefore, f+ 4s also elesed under 
- gonsesitive sampling by Theoren 2.3.3. Thue, 41f the decision-maker go in 
the state (4,1) amd chooses either to sample or te use the process, the 
pooterior dietztbation of & will be a member of PH. 

If 4% 48 deaided to sample when in state (4,1), the supramm of the 
prior expected reward is given ty equation (5.4.4). Suppose, on the other 
hand, it 18 decided % use the proesss under policy CT, fer n> 1 transitions. 
Tho prebebility that the oysten will be cbaerved in state J, unsondlticnal 
with regard to the prior distritution of @ , civen that the aystes starts 
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in the generalised state (4, '{) aml that n traneitiona are to be observed 
mxler the policy JT. is 
Try = fpMareE iy rs Aaute ennall (5e3et) 
ij é 4j ie 


e300 
o- 6% 


ye £ 
the (4)§)th elanent of the prior expected nectep transition probability 
nateix under policy J. tet 30” (a8 1) demmte the prior expected 
Gisecounted reward earned over n trongitions uniler the policy C uhen tho 
syoten otorte trom (254). Bota RM, +) mt EG8,+) aro 
digeussed in Section 4.4. Lot , .(msZ» t) denote the parameter of the 
posterior distribution of & when the syeten starts from (4,1) ami te 
ebeerved in state j after mn transitions wider the peliey JT. Tho price 


expected reward umder these conditions 4s 


o( i a{ } ; 
a” Cx8s 1 + OP ts Bag Coe T) Cult, macet)) = 05} (5-3-2) 


al, eee9 8 
9 3pe0e 


ye © 
and, 27 it Le dedided to use the aysten, the supremm of the expected 
discounted reward is 


3 :~ 
ae Cerone +p" eS es ¥) (o,(% s(macis 05] « 
| Meth, coeg B (50303) 
Yer 


“Pinally, Af 4% 49 decided to make © tenminel deciaion when in the 
generalized state (4,4), the supremm of the expested total discountad 
reward is 

— 2% (x.t)¥ Ash, ecog 8 (503.0%) 
Tet & ; ver : 





t 
$ to407 ei? toa taemede 3 
oS hot 628) wwetog as 
ies @2 2az7o beans barged 
" ra Cel cvs @. Wize, ateatve 
- of ot f pe te ne * 






-~ 


) to sottoclthedb 


Vins 6 2Fte & evade af Bs ‘ 


wii rite soe? mohar bees § 


£7 i 

funjn TO. 5. 
yy «6 he OY ee 
: a) aaa 


a oar of hobineh of RGR 


ut Beever Bel 


vf 
4, 
et 
“20 





al Fe 
We shel anticipate Theoren 5.3.4, which establishes the existence of 


en optimal sampling strategy for Model IZ, and write 2, pees 


aos... in emiation (5.3.3). Then the vector finetion wf) «= (eC 1)» 


oneg v1) mist satiefy the following fmetional equatians 






N 
&3 & ale , eit 
wCy) = max | OE LEECH) = OCH) #8 a coer crn} | 


=(7) B o{n) 
re ee AS (oho H) + PPD Bay (7+) 


x fa4Cty sla zot 20,3 


Te ites hf 
Sat, vecp H (50305) 
Yet 


We shall mow consider some properties of Medel If. ‘These properties, 
"for the nest part, parallel those of Hodell I and the proofs are quite 
gisilar to those of Section 5.2. In the following two thesrene it is 
shown that on optimel sampling strategy exiets for Model Ii and that, in 
an optimal campling strategy, © taraiinal dediclen point 4a reached wlth 
probatiiity one. We then demonstrate the existence of a unique bounded 
golntion te equation (5.305) and consider a method of successive 
appsesinations, together with o bound on the error ef the nth approximant. 


Theween, 523.3, Let v, (Ved) be the expected total discounted reward 
4n Medel. II when the aysten starts in the generilised state (4,{¢) and the 
sampling strates; aed, §a used. Let 


CHD BB Ley Craadh tte eee (56966) 
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al3Zo 
Them there is a sanpling atratery a"eb, each that 


Yt) @ Age wr Shy woop N (503-7) 
ve ¥ 


Proof. Let D, be the set of all possitle sampling strategiog d, 
in the ataptive control preblen. then D,< D,. Sappose dc H,. ten 
d either precoribes a fixed poldcy, Ui, for use on every transi tian 
n>n , for some integer n°, er elee mot. In either case, d 4s a poscitie 
ateatey for Hoxie II ani D, <0, Tims, D, 9 5,. ‘the renainier of tho 
proof is analogous to the proof of Theoraa Jelelo QeBeDe 


‘Meares Se2e2 In Model IL, Af the true state of nature @ is 6 
positive matrix and 4f tho terainel fmetions T(r») are continuous in 
T (Am, coop Ng Sek), thon, wlth probability one, a terminal decision 
point is reached in an optimal ecamiling strategy. 

Prong. Ascumo there 4a en optimal sampling etrategy 4n which a 
terminal decision 4s never made. We shall show a contradiction. Let 
@ Gedision point Im the sample kistery be a point in time at which the state 
of the system is made knawn te the degisicmemaker. Then the asausption ic 
that there are an infinite muber of deaiaien paists. There ie at least 
one state, 1, thich is cheorved infinitely often ani at leeet eno poliar, 
SX» ami transition interval, ny which are used infinitely often iin state 4. 
Lemma 2.3.7 ami the positivity of Q duply that every state is observed 
infinitely often with probability one. Sines a terminal deciclen is never 
made, there is a finite anteger, M, euch that, ifn is a transition 
interval which 4a used in the cexpling strategy, then n< ~ . For 4f not, 
an infinite transition interval ta used at seme stage, which fs equivalent 
to & terminal dedision. ‘thus, there is a finite set of ordered paira, 
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213%. 
(n,9_), where ne A is2aceeuieh and Oct, which deseribe the decisions made 
at oach decision point. We may assume, without leas of generality, that 
ail menbers of this finite set are used infinitely often in the sampling 
strategy, since any pair, (mS), which is used only a finite number of 
times is eventually dominated. The conditiens of Theorem 2.3.9 are 
satisfied, ani, therefore, the mass of the posterior distrilution of 
tends, with probability one, te concentrate at GQ as v, the nunber of 
decision points, goes to infinity. If #(@\1") 4s the detritution 
fanetion which places the unit mass of probability on @ , then t— f° 
ag v—» co and equation (5.3.5) Seamed” 
teksky a ( 1") = Bor ( 1") +8 2 Ps ye ce Vf 







v, ( Y") = mex 


o(n) =(n) 
Tel we yeeeye Va CGP) + 6 = 3 GY) 


r=) 
% tv.) 0,15 
g 
Ze ueer yh 
4=1, eoeg N ($3.8) 


It was shown, during the preef of Theorem 5.1.2, that if 
Oy gq RAK 1°) = Bek a3 ( c+ t, 
v(t ) leksit, axe ) ~ B t ) v 5) By y" iv, 


{a1 , eoog N (5.3.9) 
& contradiction resuits. Suppose, then, that 


rs Bar «{m) 7 ey 
Bd cer mee, aces ae (28,4) + 8” z, eet) Cv Pjn0,15 


ai, esog, N (503229) 





$ Tho argument remains valid when the distrittion which places the unit 
mass of probability on Q is not a member of 





ee 





130 
We may construct a new set of policies as follows. Lot 95 (8,5 cece 8. ) 
be a policy vector, where “= (n,.5 2) 4s a eholee of a trancition ahebeven: 
n¢ £4» steie) ie and a policy, Zycche If the alternative S. 4s selested 
4m state 4, then the system goes to state 4 with chvaneas a a )s 


43 


eerming the expected reward at (Zobel d= ae C,- If S 4s the set 


ef all possible alternatives a9 then equation (5.3.10) ean be written as 
H  (n,) %, -o(m,) 
Aa )=2 8,,¢5 iE Pas (Cio t ) fa," (Zobel) - p%o, Bey Ih s 


42h, coon N (503411) 
Equation (5.3.14) has the sane formal structure as equation (5.1.11) in 
the proof of Theorem 5.1.2. Am argument simtlar te that leading to equation 
(5.2.43) shows the contradiction 
vt") < = Vz rr} « AML, cooy B (503012) 
‘Tims with probability one, a terminal deciaion point is reached. 9.5.2. 
let us now consider the existence and uniqueness of solutions te the 
functional equation (5.3.5). Let the sequence of vector functions, 
u(ne}) » where x(n, +) @ (walt Ys sees v(m Ts be defined by the 
fellowing equations: i “on 
myloete PD = max | ma, SACK) = PACED + BE BCD Cm mE CE 


— a 2M e Bot) + Bs BY 1) 
Ze: VEZ» Fp eee gitth ay = or $8 43 = 9 


2 [vine 1 tes)? = 0,34 


ge (atte 


i=1, 200g B (5.3.23) 
gt a 
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@ ke max 
Loma 5.3.3 If FR = dodou 5 lms, ani C = 4 Soy} » then the 
functions v, (ny 1) defined by (5.3.13) have the bound 


% 
|v, (ns »| ¢ rer ° 4=1, soog N € $. 3044) 
NED i g2ecce 
Yet 
O<p<i 


Exot. The proof is ty indwetion. Equation (5.2.4) shows that 
(5.3.44) holds for n=0. Assume 4% holds for n. Since accor) 





aM eg 4@p” S 
8 1) a R e {st eee 8 (5.3.15) 
a é ‘i dof Ve 2, cee 

ces 

te 


Thus, from (5.3.13a) and the induction hypothesis, we have 
f & 
|v, (met, 1] < MAR R + 6C + p Bat Oe 


Io8” _« av Re ” 
wes, mot Lia Ft oie 8° Rgteh 
t ] 
a a 
lo 
% 9 
Ld 
= See, (5.3416) 
} QebeDe 


Thoarem 5.30% If y(n, {) 46 defined ty equation (5.3.13), thon 
1 ¥, (ns ry} 43 @ monotone increasing sequence (i@1, ose, N) and 
iim vin) exists and 4c a solution of (5.3.5). 
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Eenof. The preef that {v, (np tt Sa momotene dnareasing is intnetive. 
Clearly, % (Ie) > v1) Lor 3090» eoug SB arsed Tet. Assune that 
%q (ty t ) >Vylaele tT). If, for some teh and sone integer v, 


@(v) 


%la» 1) = a CesBe t) + 8 = as Ry Cao t) Cv,lemdy 1 (rn Ze tI 0) 


jee 


43 
chen 
v, (redef =v, (nyt) > BY a Pe forty Cv,(ne%, sCustot) ov Cr be% gl veret))] 


= 0. (523218) 
v, (ast) = ESF Coat )} 5 then vy (arte t > v, (ms Yo carly ustng 
(5.2.8), we have, in ell oases, 
v, (ards 1) > ¥, (Mm tT)» Shy coop H (523019) 
ter 


proving monstenihty. By Loum $343, the sequence fy,(n,t)} is bounded 
and, therefore, aie v, (a5 +) exists. ‘That the limit satisfies (5.3.5) 
«follows hy letting n>~o fin ($e3ef3a)o GoleDs 


The remaining theorens, the proofs of which parallel very closely 
these of corresporniing theorens in Section 5.2, are stated without proof. 


Teceren 5.3.5 There is a unique bounded esiution to eqetiion (5.3.5). 


Foearen 5.3.6 Let the error of the nth approximant, ¥, (as 1) be 


defined as 





@(a,t) & yr) 2 ¥, (net)» (503020) 


U2, ces, 8 
WO _le2y 000 
Tee 


where AT) = (UCT )s coos %, yt) 46 the unique boumied solution of 





0% Fp 
equation (5.3.5) and a(n, {) 1s defined by (5.3643). Then o, (np) hes 
the bounds 
O<aq(at)< i ee, ety voey H (503624) 


MED bp Zpoee 

ie 

GSp<2 
there R ami r ere defined ty (3.323). 


SorsMasy 52327 If yin 1) is defined by equation (5.3.23) and 
z(t) 4e the unique bounded solution of (5.3.5), thon { x(nat X—> x1) 
uifersiy in 1. 


Tansey 5.4.8 Ie the prlor distritution fimetion or & sc 
RClY)c 3, o family of dletrituttions continuns in Y which 4s olesed 
user vestep sampling, then x(t), the umique boumled solution of (5-3.5)s 
4o contimeus in Y. 


The mxerieal solution of Hedel YI tnvelves considerably sore 
oompitation than dees the solution of Nedel I. Not only dees equation 
(503023), the encseesiive approximation scheme for Model IZ, involve 
evaluation of mere terme than docs the corresponding scheme for Model I, 
tat the requirencnt that Q-, tho fanily of pricr distedlntions of @ >» 
be slesed under vestep sampling implies that }- ie the mixad extension 
of a fariliy of Gloteiimtions clesed wider congestive anmpling. This means 
that the mumber of parameters wirich must be harmed in esmputing solutions 
to Modell It 4s lerger then that requived for solving Yodel I. the 
etditional complexity of Model IT 4e mrebably wortietidle only fn the case 
of a prior distribution which is ght on some rews of €© and loons on 
others ami where the asst of sampling is kigh. 

We note that, while the aim of Model II 4s to ellew the decialommker 
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oi Bo 
to sample only those states in whieh there are transition probability 
veotore with loose marginal prier distributions, he dees mt have ful). 
control over the future states in which the eystea may be ebearved. for 
Cele, supposes 2¢ is dealred to sample the syatem only whem 2¢ ds in state 
4. Then e sampling stwategy mast be choasen which trades off the expectad 
Hsccunted earings of the aysten ageinet the need for a high probability 
that the systen enters atate 4 at cach decision point, 

We renarit, in thie eomection, that a decision te use tho process 
when in state i does not necessarily imply that the conaemtive sampling 
altemative is doninated at future dealaion pefinte when the aretem As 
feind in state 4. Sneh dostinancs may hold under a sample history whieh 
redages the marginal varlenses of the alternative transition probabilities 
in the 4th state, but there 4s certainly no reason te expect this to be 
the ease unier a sequence of observations which inoxeases the marginal 
variances of some of these tranaition protahili ties. 


5& déppsosinate Temsinal, Redialons. 

In Yodols I ani I it is necessary to evaluate axpreasions of the 
sora 

athe Bhat, (Solbe) 

where ¥, (Ze 1) Se the expected totel discounted rewurd earned over an 
infinite period unter the policy oC when the ayoten starts from the 
generalised state (1,1). Sings may contain a large nunbor of policies, 
At 4s desirable to find usthode of solving (5-e) for the maximising poligy 
S” wich avoid a direst search over all. elenents of Z. This problles hes 
vat been solved, tat some prolintinary remarks coneorming tho anproximet 
of © are offered in this section. It will be sven that these renarks 
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ole 
are also applianble to ths peoblen of selesting a poliay which maszint 
the expected gain, e(o»), which was Glecussed in Segtion 4.4. 

Let V, (LE) be the conditional expested total diecounted reward over 
an infinite perled under poldey oc shen the eystes starts from state 4 
ent FoF. the polley which macimiees Uke reward can be found 
efflolently by moans of Howrd’s poliay iteration algorith: [22]. It 
ems geen in the proofs of Theorans §.2.2 and 5.3.2 that, as the number of 
observations in Model I or Model IZ goes to infinity, the mass of the 
posterior protebility disteiiution of & tenis, with probability one, to 
comoentrate at the true state of nataro. Thus, if © 4s the meen of 
the Meteitution of € , wo can approximate * ty &, whare & is 
define’ by 





¥, (2.8 ) = mee Ye : (S.lbe2) 
the error of this approwimetiion goes te sero with probability ene aa 
the mmber of observations ef the process goes te infinity. We consider 
here a bownd on the error, 
Sor = Hert) -Bchet. (508-3) 

Lot the polices, Is bo tndemed ty J. Times E#{ Ty oeee Tt 0 
there J is the mmber of distinct policies in T=. for a Gimed index i, 
let es be partitioned into a set of 7 mxtually exclusive and exhousti 

gubsetsy 5, .» cuch that, if € 68, ot then 





Vy Co ©) = 2 Elk e (SoM) 
tf ACP |t) is the pelor distribution fimetion of { » let 
P, 3 eae Jd olde 


denote the prior probability thet © vellongs to 8, ,° Since the sete 3, , 
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Let ¥, (ee 1) be the conditions! expected value of V,(-oG)> given 
that € es, 4 


4 
Vy 3(xo t) s Att) < VCS & aH \v}. (5e%7) 
Sis Shy soog J 
6h 


We rote that (5.4.4) tmpltes 


YS T)> 1, (x ). ete coog F  (5elt-8) 
te £ 


Let R(T.) end vfs.) be the maximum and minimum trencitien rewards when tho 
policy Te (cs wees T,,) te wed, 


RU) = os jek | LE (S.t0.9a) 
mim <. 
ME) = 94 whic Le (504.9%) 
and let r and 2 be defined ty equation (30303). By equation (4.3.4)5 
oo i 88 
(a) % OG 
Wy(2eG) 2 = Be ey 2, SS Pope Ph (Selre0) 
at} 
Gedy 5 
8=p<2 
ena, therefore, 
< a < a RISD << wodbenme 
ts< 7 ca VS oS a8 Joh (Se%—43) 
Ke 
Ye F 








rom) = no) a 
1-58 = ve) (ct) = rer ay ag eceg 0 (S.l%.12) 


€ 
6 Bp 2 
jean, Soboh Por J°85 econ dg the folleming inequald ty 4s valid, 
% $ rc) 
Hlp Dash) < Wig - a> ach) (54l8.83) 
eee J 
vee" 
0=p<1 
Exot. Sino the sete 5, , partition of? , ., wo have, using (5.1.12) 
VL td © Ty Cot Py Ct) + Fal S ye Py CP 


DV r(Z,) 
>My gSge HIPC) + Gok Ct EL 


(Sebel) 
Eenation (5.4.13) 4s 4 rearrangenent ef (Sole L%) « QeEoDe 


Lams S.4.2 Yor any poliey Sch and any index je 31, eens 34. 
the expected discounted reward under the poliay 2 has the upper bound 


Se Gal aaa ies 1~P, 
¥, F< Vy (Sip 1) + at Fs (Her 2, ))0 ($el%.2S) 
jet, see9 3 
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O< B< 7% 


Brent. We have, veing equations (SSE), (50408) 5 and Lemmas GS lohs 
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@Lyar. 
pos iz 
< Wage YD yCT) + CheP, C1) Be 
iP, ,() 
< Ger) +B meg. (50846) 
QekeDe 


Theoxen 5.4.3 Lev @, (Zoo T) be defined by 


hy coos 
401, ceo, N 
ef 


a (Sao ¥) @ ¥,@ = WG") 3 (Selte47) 


where i Se the mazimiaing terminal nolfay defined by equation (5.4.1). 
Then @, (f, ot) has the bounds 
: toP, 
Oo< o 
(Sp F< i CReMTy))- sag oo, Hy (5eMe88) 


eee a 
Pies e 
O<B<1 
Emok- By equation (Suet), Wylerst)> Vy (gt) and ost) > Oe 
The upper half of the 4nequality follows from (5.4.15). QEoDe 


In order to bound © (Lao Y) using equation (5.%.18), 41% 4s necessary 
to evaluate Py (1). This problem has not been completely solved, chiefly 
beseuse there is no satisfactory method of finding the boundaries of the 
eet Sa 4° Moreover, Sa is mt necessarily a camected set, which further 
complicates the problen. The probebility Pa gf tT) can be estimated ty ucing 
mmericai or Mente Carlo techniques. 

If a(t, &) 49 the gain of « Markov chain with alternatives when 
operated indefinitely umier the poliey T, and 4f ef, +) ia the 
corresponding expected valine uhen € has the distribution function 
RCC |), then, as we shell see in the next section, it is often necessary 
%> evaluate compressions of the form 
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Ur te SE Sarot} - (54.99) 
tf @ 4s the mean of the distritution A(?|+) we may wish to approximate 
o° ty &, defined by the expresdion 

ab )= Se. lelaeE y - ( 54t620) 
There are efficient algorithms for the solution of (5.4.20) [16, 22]. Let 
the erxor of the approximation [ a, be defined as 


war 
¢L ot) = ac *,t)@ - ay +). wee coven J (5.28) 
re £ 


A beand on OL 4 Y) aimiler to that of equation (5.4.18) is enahly derived. 
at B,y be the ost of alt pocttive x z N generalieed stochastic 
matrices ond tot 4”, , be partitioned into J seta, Sap where, Sf & e345 


then 
et, 2)= ME Jatceert . (5otn22) 
Te MIT) is the prior distritation fmotion of & , let 
Py) = bi ante) Fly coog J (5023) 
Ss; 7s ye 


be the prior probability 7, & a 


Bf Zot) = FATT us ToL MEY) Hhyeneed (Such) 
oe e 2 
be the conditional expectation of ane )} given that ca eS ge fhen, by 


(5.4.22) >, = em 
es S 5 1) > 64 Lo +). HA, coog J (5 e%025) 
oe 


te L 
Tf RCCL) ani r( £) ave defined by (5.4.9) ami R amd » are defined by 
(303-3), (8.4.1) Smpiies the inequalities 

psx gC, T)<R | Tf (508.26) 
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Lome, Sabb For Jai, coon J 
Bs Lgot Pat IS Bye) = (2=P.(4'))P(,). (5044.28) 
b jot, eeeg J 
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Emek. Using (54.27)» 
Blige 1) 9 Bee HIPCH) + Beye HALT 


> Bq(yp PC +) + (teP, (Te). (5ele29) 
QeEeDe 


Jemma Salted ee a ey Je {te cove SS 


alfot)< atrye FD + (oP 1 Ces). (508039) 
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et >» 


trek 
Proof. By (50l.26)_ (Selbe25)5 and Lemma 5.%ol, 
Det) 2 B(oe HPC T) ae Gl 2s HRY) 


& B46 Tet P, (v) + (oP, C1 ))R 


= BCLs + (tor Cr oT) {5b 3) 
QoBeDo 


Thearce, 5.6 the ersor fmotion e(..» 1) defined] by equation 
(Sefe2%) has the bound 


) <TD < (BoP CTD) Chor)» Hite gt 3 (3.4.32) 


Eeeot. The theorem follows directly from equations (5.4.19) and 
(5.4.7). QebeDe 


505 Yndiecmnted Pmooreas. 
We have comented in Section 3.6 on the leds of a clear criterion for 
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@ ff So 
naling dechetone in an adaptive contro] model with ms Gisesunting. These 
remarks apply as wold to mnidiseounted tomiinal control medele. Ono 
oriterion we may use is to consider the cleas of sampling strategies which 


mamimise the expected stenxhestate gain of the process, then to choose from 


this oless the strategy which maximises the expected rewmerd over the 
twenalent period which precedes the terminal decision. This criterion ie 
wade precise in the present section, where Medels TITY and Iv, the ur 
dicoomted anslegues of Medele I and Tl, ere introduced. Ne enalyais of 
these models hae teen carried out. 


$54 Model, Tk. In Model TET At is assumed that the process will 
be sampled conseeutively until a termina) decialon point is reached, at 
tich time a terminal policy is selected and the gystem i9 opsrated 
wnier thie poliay over 4 finite terminal eperation peried. 

Let v,({3v) be the supremm of the expested reward over a ported 
whese terminal operation phage lasts for v transitions when thse systen 
starts in state 4 and the prlor Msteitation at Fis H(Plt)e Hs « 
family of Geteliatlons closed under amescutive sampline. If, when ia 
state (3,1), 1% is dosided to sample at least once more, the supremim of 
the prior expected remand ic given by equation (5.1.4) with B = i. 

Since wo shall be concerned with large values of vp, wea assume that, 
when it is cecided te cease sampling, a terminal poliey will be selected 
which maxingges the steady-state gain of the systen g(T,%). Therefore, 
when &% 46 decided to conse sampling, the supremm of the expected rams 
over the terminal peried is 


= { wcetr . (505-2) 
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ahiGe 
Tims, unier the sssumptions of Model Iii, v,( su) mst catisty the 
following fonctions] equations, 2 . 
Ht ofg wig _ lg 
va ( tv) & mac sae, 1 CE) = @, ( +) + om BE ctw Caf Crd awit 


ae, See th | 
ap = N (5e5e2) 
51,2, 3p 000 
The arguments of Seotiena Sei amd 5.2 required thet the disesunt factor 
8 be leas than unity and, therefore, are not directly applicable to Model 
TEE. The existence ani properties of soluttons to (5.5.2) are matters 
for fatare investigation. 


Equation (5.5.2) yields the supramm of the expected reward over « 
peried with termine] phase of length w and eles ylelds a decleton for 
the euprent transition interval. This dectalon, which ia dither the 
selection of sn alternative to be sempled or a terminal poligy, will be 
“ ealled a wovtinel, decision. A veoptimal decision videh ie the sane for 
all v gafficlentiy large will be called an gutimal decision. Since, for 
lange v, overy v-optinel dedicton maximizes the expected gain and cleo 
maximises tho total reward aver tho sampling pewted, it 4e seen that. an 
optimel deaision, eas define’! here, satisfies the eriterion set forth at 
the beginning of the section. ‘the existence and nature of optimal decisions 
‘have not yet been investigated. 


§.5e2 Hodel IY. We maw assume that the deciaionemeker can sample 
er use the proceso et any time prior % tho terminal desiaien, indeopendatly 
of tis past decisions. Let v4 C18¥) be the supremm of the expected rewar! 








ohiife 
over & period with ¢emminal eparation phase of limgth v when the aystan 
starts in state 4 with the prior distritstion K(P |). 1446 assumed that 
H(E\Ye H-» a femily ef distsibations closed uniter vestep sampling. 
Folicuing the argmuents ef Section 5.3 and of the previous paragraph, 4% 
is seen that v,(‘t3v) mist satiafy the following functional equation. 
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vy (Yay) = max oe ; Gene ie + 2, Big He, C8 Cte) 
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x Cv ,(%, s(n» Lo V4) -e,}} 


moe Sac 1) 
—_ seep N (30509) 
. VEL S25 3p 000 
The renarke made above concerning woeptinal decialans and optimal desiclons 
apply as well to Medel IV. 

By aporoeshing optimal decisions for the undiscsunted terminal control 
models by means of optimal decisions wo have emphasised the fact that 
an uniiecsunted infinite horison mdel ie an eppromimation to a systen 
which wins for s long, tat finite, peried. We ean equally well view tho 
un@iesounted madel as an anpromimation to a oyaten with a discwnt factor 
very close ts unity. Thus, another approach to the solution ef tho 
miiscounted terminal control problien is to let §->1 in Models I and IT. 
The existence and propertios of solutions obtained in this manner ami their 
relation to solutions obtained vie vwoptinal decisions have not yet been 
auvestigated. 
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Rsowmie? Eeosgesges With Sefelin Goaha. 

In neny proessses hich can be mdelled as 4 Harkey chain with 
altemmatives there 4s a cost aseeciated with changing alternatives in cach 
state. Snok a soteup cost could Snchale, for example, the eost ef starting 
the operation of alternative k and of shutting dom alternative j when 
in state i. Seteup costs san easily be introdmesd inte Medels I ani I; 
we ilnstrate ha this is done in Medel T for a fined cost, S, wich is 
incurred for each change of alternative made. ‘The mothed fe easily 
generalised to the case in which 5 ie a function of the state in which 
the change is made end of the alternatives involved 4m the change, and is 
alse applicable te the adaptive control model of Chapter 3. 

Let 2.7 (Ts eang T denote the poliay under which the systan 
is qirrentiy operating, where V,, ie the index of the alternative in use 
4n the ith atate (Tah, #809 Kade We now define the generalized state of 
the syetem as (1, 1:92)» where 4 is the physiosl state of the systen 
(SP, oo) N), ‘Y andence the prior distribation of & (te H), ant 
Sia the policy currently in use (ter). Let (1s be the supransz: 
of the amected total discounted reward over an infinite periad if the 
aysten starts in the generalised state, (4, /,9,)- The prior distribution 
fenction of © 4e assumed to belong to a femily of distritutions closed 
ander consecutive sampling. 

If the aysten is in state (4, V,7_) aml it 4s decided to samile 
alternative k, the supremum ef the expected raward is 

Nx oe 6 | 
pq, 70)S + ECT) = BCL) + B EBay) 6.6.4) 
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m Xs am 4 ($2622) 
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43 
4e the Kronecker delta. The quantities act) ond ar) ere defined by 


equations (3.2.4) and (5.%.3), respectively. 
Tf it is decided te esase sampling when the aysten is in state 
(2,1 .2) and operate indefinttely under the poldey T° 2 (7% 5 coos odo 
the expected reward 4s 


8 
$ 2 (Scyuye @4)} + ¥ ASAT ar (52603) 
Rae 


where ¥,(x*,/) is the prior expected dsccunted reuard for operating the 
eystem over an infinite peried under the poliey .¢°, starting from state i, 
when H(C|Y) is the prior Gistritution of. . 

| Tina in lodel I with a fined set-up cost, v,(142.) mst satisfy the 
following fmnetional equations 
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CHAPTER 6 
| DISTRIBITION THEORY 


In this chapter we introduce some probability mass functions and 
denaity functions viieh will. be required for the next chapter, where we 
de the pricor-posterier and preposterier analysis of a Markov chain observed 
under the conseoutive sampling rule. The Whittle, Whittle-1, and 
Whittle=2 probability masa functions are defined in Section 6.2 and formas 
for their moments are derived. The mitivariate beta density function ia 
eonsidered in Section 6.2 and ie used to define the matriz beta density 
funetion in Section 6.3. Some axtensions ef the matriz beta distribution 
are considered in Section 6.4% and the chapter concludes with a discussion 
ef the bete-Whittle probatiility mess function. 

The miltivariate beta density function, as defined by equation (6.2.2) 
below, was introduced by Naulden [30] in 19593 Mosimanm [31] has studied 
the main propertics ef this distritution, Tho matrix beta distritution 
was used by Silver [36], but not under that nance. The Whittle end 
beteWhittle distritutions are original with the present work. 


6.1 The lbittle Disteibtien end Related Distrmiions. 

let %, ® (ys io cers x) be the sequence of consemtive observations 
of the states of a Markov ebain over a peried of n transitions, whore 
% ou is the state of the system prior te the Mirst transition. The 
range set of the random vartables %, 43 the set of integers which index 
ths etates of the chen, (1, 0+) HL. It 42 assumed that the transitions 
are governed ty e knam 8 2 N stochastic matriz, £ = [p, .Js and that the 
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distribution of the initial state, %, 45 a kmown stechastie row veetor, 
P = (yo eeey Pye 

Given a semple outosme, z» we define the statistic ty as the minber 
of indices me {0, i, eeeg nei} @ such that * 21 and ae Ss 3 (2, §e2, ery) Ie 
In other words, £45 is the mmber of ecourenees of a transition froa state 
4 to state j in the sample x. let Fe (4, ,J. en N «x N matrix, be the 
Sxengition count, of the sample. Prior to the observation of x » F and 


% ere randon quantities whose joint distribution is stadied in this ecotion. 


Let 4 
a ot a pe 455 Aah, coop N (6.8.8) 

% 8 
ay = os £4 1, ©8069 8 (6.4.2) 


be the row end column sums of F. With the exeeption of the initial and 
final transitions, every transition inte state 1 in the sample x must be 
foliewed bp a tranaition out of state i. Tasrefere, the elexents of F 
are constrained by the eqiations 


f. « fs & On e bey? 42, cocg H (6.12%) 
where us x, in the initial state and v= x is the final state. 


The following lemme shows that, given a transition count F and an 
imitsial state u, the final state of the sanple fsa uniquely determined. 
A Similer dorivation shows that, given EF and v, u ia unieucly determined. 
Lomas 601.5, below, shows that F dees net necessarily uniquely determine 
bots u and v. 


Joma Gebel Let ue {t, o0, NY be fined. If F is an Nx N matrix 
of non-negative integers which eatisfies the equations 


£., ne £4 2 Sou a Say 423, eeeg N (6.L.0) 





ta 








aL So 
for some integer ve {g, even nt » then v is the only positive intezer for 
whieh (6.1.4) is truc. 
Exeof. The proof is by contradiction. Asaume that v and w both 

satisfy (6.1.4). Ifugv, then 

fy. on. S bo | * of 
andy ALED» 

fo be i. 7 Sa 7 = z aoe 
Which implies that vew, Ifuev then 


ao £ a 6 » & 2O 4E]1, coon I 
ae od, 4u 4u : ¢ 


f,. - fa & oe Saad gee 422, cceg 
Tim6, 
Pes Ss Pee isi, coeg MN 


gad w= % Bo Gebslo 


Lot I= {0y%52,...4 demote the ect of all nonenogative integers. 
For fired uc 44, “a ny 9 We {hy coos Bs né {i1p2p3p000} ‘ and Peds 
Gefine the following est of N x N mateteos, F = Ce, gJe 
P _— e 


| > 
‘2%, eis tee z £ sn f o« f cae ° 4g? f, 20 Af p, as Jety 


£5 Le of 3 5 
(6.4.5) 
Let ‘ 
P (upg?) = \) ra (tip yttyP). Wy ecog B (622.5) 
= yaq oN a ie Ses 
Bed 


Et 3s clear that Py (ast%oP) 4s the set ef ali possible transition 129 
F which can arise from a sample of n consemtive transitions im a Markov 








ak 


chain with transition matriz P ard initial state u. 


64.4 The tittle Distr bation. The N 2 randon matrix F = C2 , ] 


With range set ra ig BoMeb) 4e said to have the Whittle distritution ed 
— (ayn,P) if F has the joint probability mase function 
ff, 


a.! 
re (F UptigP) e F mt 


it 33 Ee? (asm®) 
= i TC fsbo en ? 


=O» otheruise (6.1.7) 


The inder v is the unique selution of the equations 


ff. ~f. = 6, 2:8 


Weng: ads hae Ady coon B 


and Fs Ais the (vya)th cofactor of tho N x N mateix r ee ery Cefined 
by 





a £>0 — (604.8a) 
&3 gt f, a (6.4.83) 
Sinee, in (6.1.7), there may ba some Py, 20, wa use the convention °° Le 
We have ealded the mase function (6.1.7) the Whistle distri bation 
because Whittle [41] was the first to show that 
ki ay 
Ty Sey: gg Sek (6.449) 
ni get *5 id H 
Be P (apne?) va ll, £0 


Whittle’s derivation of (6.1.9), ami gubsequent preefs of this relation 
by Dawson and Geed [15] and by Goodman [22], were obtained under the 
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rostriction £, > 9 (vei, aoeg Npe MDiangslesy £201, in a partioulariy 


elegant proof of (6.1.9), did not require this restriction. 





6.4.2 Menente of the Uittle Disteiiation. We now derive exprosaion: 
for the means, varlaness, and covariances of the clenents of F. Before 
presenting these reaulte, however, it is necessary te summarise cortain 
facta from the theory of matricee. 

Let P be an Nx N matrix with eigenvalues Ay» «sey A,» aeaumed to be 
distinct. Lot g(x) be an artAtrery scalar polynomial, @, + ar + o6. + 9,2 
and let g(P) be the corresponding matrix polyneailal, aq + O.B + ove ¢ag 
Sylvester’s Theoren states that 


" (ie) 
gE ef ot) ae (604020) 


where the 0 x N matrices Aare defined by the expreseton 


C Pj} 
as ye tal ys ameak = 2 (6.2038) 


sie (hy, = Mg) 
kad, ee N 
These netrices have the following prepertioss 
a) 49) ao, i63 (6.1420) 
a; se daly coop HB (6che22b) 





A forma cinilar to (6.4.40), called the gonfinent, fam af Svimoetar’s 
Sheasay, 1c avedlable in the onse of repeated eigenvalues. 

if P is on ereadie stochastic matriz=-1.00,5 ag P $6 the transition 
miteiz of a aingle norepemicdie Markav chaine=-then exactly one aiganvalne 
has the vaine unity and all. other eigenvalues have modiine legs than witty. 


We shall adept the eonverttion that Ag 45 the unit roots 
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o15> 
x & Zz (6. 2 03a) 


[rg 4 i. 4n2s eee 8 (6.40230) 
Then the nateix a‘") 49 on 1 x H motel each row of witch 4e tho eteady 
state vector I= (74, aeen 7) defined ty the relation ZZ = IZP. 


Sheores 6.1.2 If the W 2 N rendom matrix ¥ hee the Whittle 


Gistritotion with parameter (uptipE)» then the conten vaine of % 4 42 


z, = pt 14 
4] bed oom) a é. Pa Py 4° a aad (6.4 ) 


striate 
stare 5 is the (aA) tp elanent of F. I, furthermre, P is ergodic 
. ia gq? °ee8 he of F are distinct, then the expected 


valine of £55 has the spectral representation 


N n 
& sCasn) Cs) Pay (aq, > feces = cape 1, melat3 esegil (6.4.15) 
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vere 4’ = (al?) 49 defined wy (6.1.11). 

Ermag. let £, (ayn) be tho cmsbor of transitions fron 4 te j in a 
sample of n trensitiens whieh has initial state nu. Frior to the 
- ebeervation of the seule, £ £, (ten) io 2 vention variatile. If the aystan 
stents in state u and the first treneition te to atate ky then & (ayn) 
satisfies the equations 


£, (ren) © 6 eS wrt “te + & ag re), geld a (6s%.a) 


fy g(tte2) i 8a beg? keh, eoog HN (6.4-16b) 


TERS t goer satiefles the equations 
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N @ 
¥, , (ton) & Pag?at ee P gta g eored) MBL» 3p eee (6.11478) 


£, glt04) ed Paso * (6.4.17) 


We shall prove inductively that 


2 (foment ac (6.4648) 
a £ 6 4 eoog F ehe 
age" ket Pi Pa - a2e3vece 
y=i, ©ee9 i) 
Sinse SP, j = § ae. 3° (6.1.47) 49 satiefied ty (6.1.18) for nei. Assume 
(6.2.28) holds for ne Thon, using (6.4.27a), 
~ HN omed Gy) 
£, gfe) © Bushy * oe Pac Pea Pay 
nod 
(mo2) 
ap wae a + £ »p a Py j 
m= {k) : 
ee Pat Py 5° ¢ 0% e089) 


preving the induction. 
Tf all the eigenvalues of P are distinct, Sylveater’s Theorem yields 
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(k) - ie af 
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Lr ae? a wh e HeOede2scc0 (6.2.20) 


Ig, furthermore, P as ergodic and, @ Las the only cigenvaine of unit 
modulus, equation (6.1.48) ean be written 
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Thearen 6ei.3 If the R= WN randon matrix ¥ hes the Whittle 
distribution with parameter (apmsP)» them, fer AeBeyeS = i, ccoeg Hy the 
covariance between f,, and f, iss 
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M22 — Fy 000 (6.42222) 
If P 19 ergodic and the eigenvalues of Py Ay ooo» Aye are all distinct, 
the covariance of Fp and & hag the spectral representation 
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when n> i. 
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first transition is from u te k, 
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Zt is clear from (6.1.17b) that (6.1.27b) equals (6.1.26b). ‘The case 
W22_3_ee0 L211 be proven by imiuetion. For we2, it is easily verified that 
(6.1.26a) ia equal to the expression in (6.1.27a). Assume (6.1.27a) 
satisfies (6.1.60) for ne Then, using (6.1.14), 
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In Theorens 601.2 and 6.4.39 the spectral Soisseinteewens previde an 
efficient method of computing the moans, variances, and covariances of 
olenents of F. This mothod is partiowJlarly useful as the paranster n 
becomes large and, in fact, leads to relatively simple approxinations for 
£, ,(a0n) ard osv [Fi gsh 9] when m ie sufficiently lerge, as is shown 
in the folowing corellary. 
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Somllary 6.4.4 If the Nx NH rendom matrix F has the Whittle 

steltution with parameter (apts?) where P 4s ergodic end has distinct 

eigenvaines, then, for large ny the ‘anauens valne of @ and the covariance 
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; Proof. Equations (6.4.33) and (6.1.35) are obtained ty letting n 
besens large in (6.1.15) and (6.1.23), dropping terme of order 
a (eps2, eeog Bs excl noting that 


ee HO, song NH (604636) 
QeE.D. 


6.403 The Ubittle-! Distritation, Let ii be « random integer with 
range act {45 soe, sS and let Fe C5) be an K 2 N random matriz with 
vange sot P,(usnyP). Tho ordered pair (UF) te eald to have the Whittlo=% 
Moteibation with parameter (pemyP) if (GF) hes the joint probability sass 
fimetion 


(3) (8) : 
) = FE P to peegin 
fizq, (Oak | Pomk) = ysis CE | wom) Fe Pgitctoe) 
a O» otherwise (6.4637) 
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Sines Pp, > © and Ey = i, 1% 4s clear that 
wr 


oD) (uF | Poto2) > 0 

amd, using equation (5.1.9), 

ie r Pon ea, F| Poe) © 1. (604.38) 

It is readily seen that, if a) has the Whittles{ distribution with 
parameter (Pony), the marginal distribution of T is 

e ful p] © Ps U1, soog N 
= 0. otherwise (6.1.29) 

The nacginal Hstribution of F is considered in the resaining paragraphs 
of this section. 


6st The Whittle-2 Distritntion. Let F be an N x N random matrix 
with range set 
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The Whittle=2 distribution with parameter (PeneP) is defined as the 
marginal distribution of ¥ when (G,¥) has the Whittle-1 distribution with 
paranetor (DeteP)s 
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It is clear from the definition (6.4.42) and the fact that 
Map } pots?) 4s a probability mass function thet 
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leans is required. fo this end let >. “(n,P) be partitioned into tw 
BStSy blab) and Polos defined as 
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hme) = P, (ng) = P yg PoP) (6.4.03) 


P. (m2) 18 tae act of all transition counts which otart and end in the 
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aane state and P,..(npp) ie the set of all other trensition counts 4n 

Py, (mB). Poth sets are nonempty. 


Lemna GeinS Let the sete Pyle) end Pam?) be defined ty 
equations (6.1.42) end (6.1.43). If Fe (5B) there are exactly pairs 
of integers, (x97) Ss Cured)» wei, coop Np which satisfy the equations 
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If Fe  o(neP) there is, ty the definition of <b’ (n,P), at least 
one solution, (u,v), to (6.4.4) with u gv. Assme (u, v) alse 
gatiefies (6.1.44). Ifu = up, Lena 6.4.1 implies v Sv. Assure 
u’#u. Then, if v gv, sudstitution of (u,v) in (6.1.44) yhelds, for 
4 BY, 

fe. a fv ah 

while (u, v’) substituted inte (6.1.44) gives 


fe. = £ eV S ae 9 


a contwadiction. If v 2, then (u,v) substitated into (6.4.44) with 
4 =u implies 
fa e £ a a i 


and, since vy avs Uy (a, v) substituted inte (6.1.44) xlelds 


fa. is fn - oe 6 


whieh contradicts the assumption a fu. Tins, (us v) @ (up7)e QeBoDe 
Theorem 6.1.6 Let F be an N x 8 random matrix with range set 


P, (noB) which has the Whittle? distribution with parancter (pyn,P). 
Then the probability mass function of F is given ty 
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where an is the (x,y)th cofactor of the matrix Fr Aefined tyr equation 
(6.2.8) and, when Fe ProlteP)o (u,v) 4a the unique solution to equation 
(6.4.48). 
Eronf. fy definition, P,1(ns2) and ,o(npP) are tually 
ect oot wt together east 0 rane sty (mpP). If 
Fe ut 45) then, by Lemma 6.1.5, Pep sChoteP)s Lely voey Hand 


5 ep |dengP) > 0, 41, cooy H 
whieh ylelds the first line of equation (6.1.45). If EE? (npP)}_ Lorna 
6.1.5 implies there is oxastiy one value of u in the range <a coop Bh 
such that 
£m | usm?) > 0, 


which yields the second Mine of (6.1.45). QED. 


66405 Moments of tha Whittle=t Materibtion. In this paragraph we 
Gerive formulas Mh ssaas ge sah OT ree eer 
betwen 2, and an when F has the Whittle-2 distribution with parancter 
(DotteP)- When P 4s ergodic and ps I, the steady state distribution 
corresponiing to P, particularly cimple formas result. Related moments 
have beon derived ty other authors. alacenn end Geadmen [1], asaming 
that many Mariev chains which are governed by the same transition matrix 
are simnltaneosly observed, find expresaions for the means, variances, 
and covariances of @, ,(t)» the mmber of eystens making « transition fron 
state 4 to state j on the tth transition (4,jel,y ooo, N). Good [20] has 
derived formilas fer the mean vector and varlance-cevarianes matrix of the 
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4a the number of times the aystera ia cbserved to be in state 1 (Anoluding 
the initial state, but met the final etate) in a sample ef nm consesutive 
transitions, assuming that the distritation of the initial state is I, 
the steady state distritution corresponding to P. Our emations (6.1.53) 
and (6.1.53), when summed over j and over B and &, rospestively, reduces 
te Geod’s formilas. 


Theorsn felsZ Let the N x N random mateix F havo the Whittle-2? 
distritution with parameter (psnpP)- Then the expected value of F, , 
is 
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s tay ] are defined by (61.11). 


equation (6.1.46) fellows immediately frem equation (6.4.14). When P ie 
ergodic with etinet eigenvalues, (6.4.47) follows from (601.15). QoSclJe 
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equation (6.1.49) Lolistvees immedi ately from (604627) ay Being (651.52) 
tegether with (6.4.50) and the spectral representation (6.1.29), equation 
(604.5%) 46 obtained. QED. 


Somers 1.9 Let F be on N x N random matrix whieh hae the 
Unietle-2 disteitution with peraaetar (7f)n,P), uhere P is ergodic am! 
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representation, 
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equation (6.1.53) follows immediately from (6.1.46). Uelng (6.1.96) in 
equation (6.2.49a), 
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equation (6.109%) foliews fram (6.1.57). 
if P has distinct eigenvalues, them, by (6.1.12a), 
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ami, in this case, equation (6.1.51) reduces te (6.4.55). QebeDe 


6.2 the lulidvertate Bete Matribstlon. 

In this section we consider the miltivariate beta disteitution, which 
as an extension of the beta distritution to N dimensions. There are 
several different generalizations of the beta distritution; this particuler 
ie 45 due to Hanldon {30}. ‘the moments of this distritution have bem 
derived ty Noatmann[32J, we aiso relates the mitivariate beta distritutian 
te the gamma distritutieon. Sene of Nscimenn’s results ere presented 
here for the seke of completeness; ‘the proofs, for the mast part, are 
original. 
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veotor, Db B S (Bye eee ie 43 said to have the mitivariate bete distriimtion 
with parameter m if Pf has the joint density function 
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normaliaing constant, B fa). 
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Since B(mn) > 0, 2% 4s clear that 
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then follows that the mitivariate beta density function, as defined by 
equation (6.2.1), ia a proper density function. 
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Let us meke the integrend transformation 
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The methed by which Theorem 6.2.2 was proved can be generalised to 
provide an identity whieh will be usefal in subsequent presfs. 
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62.2 The Multiveriste Bete Disteitntion Eunstion. 
If the random stochastle veetor PB has the miltivariate beta 
distritution with perancter gp then the multivariate beta distritmtion 
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As is the esse with the beta distritmtion function, there fs no sloeged 
expression for equation (6.2.18). We can, however, express Hcp | a) 
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beta distributions is eentimuous in 7 e In Theorem 2.2.1 1% was show 


that the matrix beta distribution is the naturel conjugate distribution 


for a Markey chain which 4s observed under the consesative sampling rule. 


It then follows that the fambly of matrix beta distributions 4s closed 


umier consecative sampling. This property is used in the folleulnge theore 
te derive the moments ef the matrix beta distribution. 


Toeoren 6.3.3 Let Ee~ Tig lites center emaeestons stontantsg 
ee: SEE ene ™ = (me 5 


Then 


i Hay 
var C4] CH)"CH + 4) 


Kk 
Bj] = BM) = a 


wed, eaeg Ky (6.307) 
4 


oJR@i, e009 N 


kel, ees9 g, (6.3.8) 
4.9 $22, erg N 


FPA, cose K (6.3e9a) 
amy, soos N 
a cosy N 


ie op afry (603-9) 


WL, coop & (6.3210) 
ial, eine 


ke ; 
Proof. Let TM) be the matrix m with the element at 4inereased 


by unity. ‘Then 
| Ade ptkolt) 
ete 4) = Jt ay (@ lM dae 


Ee Ee tb PCat : 64° 





k 
ie | 
i} 


Rep 


eR 


a3 


of te eI cwnrae 
on Ardy [cme + 2) 23 


(603084) 


“ 


ca 





q 
> + 
<= “> 
t 
“ 
ve 
° 
' 
piteres 
: 
' 
¢ 


% 
* 








a18%0 


for § fk or a £ yp Kp end ®, aro indepermient random variables and 


covliiine B12 0. If uk anda = y, Lome 2.3.2 yields 


° sia Peal Teg) 


x ag af 6 
mK CE + 2) 
aX (a, + 1) 


P= ae as BS eg B @ 8 
M (H + 4) 


from which equationa (6.3.8) and (6.3.90) fellow. .5.D. 


By writing equation (6.3.8) as 


alg ole 
var(t, 4] 2 


+i 


where pe 


7 2 By (Hs 4% 4s seen that 


0 < var rr] < }- Why coop K 


Ag J@1, coog N 


Similarly, equation (6.3.9a} can be written as 
B, B 
covl ios jae ——— 

i + 2 
and, ince Bi, <1 = o., ve have the bounds 
- vari.) < < covlBigs Peg] <0 
~ var(#i,] < covliiigs Big] < 0 


for k = Le 9069 K3 GehyS 2 i» econ Ne 


(6.43012a) 


(6.3242) 


{6.3033} 


(6.301%) 


(6.3245) 


(6.3ei6a) 


(603-26b) 


r 





1Ghe 
then @ has the matrix beta Aistrilmtion the rows of § are mutually 
andependents tis, the general joint moment is 


or? 


HON OR n K 
By i it ye (Br v4 oii se elit ato > (63487) 
det jot wag = 49 §oi teed gay 49 
where the vj 4 ef nonenagative integers. Let 


N fc N i 
s1 aT cee Ag - | we Mag gfFoN) : 23028 
E iho | : pu (Py 5? J fn3 (2% dae (6.3.28) 
iN 
The following theoren provides a recursive formmilsa for commiting this 
expectation. 
ives 69,2 If the K 2 8 random matrix g has the matrix beta 

Astribution with paremcter WM , then 


z | dh ct sla] = Ba(M) fog Z ays |i] ' 


UEPL, coon K, (603029) 
iat, 2825 na 


ubore the v,', are nonnegative integers and a ie any index euch that 
i : 
Veg 70 
Proof. The theoren follows immediately by applying Leama 2.3.2 to 
equation (6.3.48) » QeBeDe 


Since the mitivarlate beta distribution 4s a special casa of the 
mateis beta distriiation 4n which K o 4, wo immediately have the following 


 gorollary. 
Soroblarz 6.3.3 Let the random stochastic vector B = eas Byo eee %,) 
have the ae 4 bete distribution with parameter m= (m ye ores Ryde 
Then, if H 25, Hye 


: vey 
‘ 
t at 1 
a) co : 
: 
: ot 
. : macey Fe as _ 
i RAsIIOZ SY LST! O G 
’ 
~ 
’ ++) : - 
; +4 arcu ; 
rw - 
r 
mand 
a 
' Tig 
= 
i 4 » 
i] > « 
1 ' 4 soy = = 
‘ +) » q et ’ ve, € P | 
; Ss y ; s”3 i . : 
of - 
) Ly 
: 
® 
4 s : 
S 
™) 8 : 
df a ‘ . Fore ~ 4 ° 
Mi (OGe MeO Ye af © hom ere 
re 7% ~ 
, 4 
as 
=jp4a 
— 
, A ” 
1 , . Ra 
TX 4 
Fad . 


% 
> 
e « : = - Pa 
ohtvera mxsgodts yiieolig? 


24% mires) “4 A ) = 


+r) ¥ i ¢ : : a 





= 


sulle . (62 £ 


4 » 

Tt @alseritinn ofl? cond 
2 + som Be Pre Be, M 

4 « he aod le ne 


@18 5 


ECB, J = p, (a) = tar LS) §=i, 2009 N (6.3220) 
K 
{Ho m,) 
varthi 2 phe ety seep H (653428) 
(H+ &) 
Ba 
entny 6 sate (603022) 
es Fore he 


etm 4 lales@ ste? 7 |r) (6.3.23) 
int By B Ba\ & aid, @ oe 


where the v¥, are nonnegative integers, ¢ is any index such that v > 0, 
and T.(q) Le the veotor m vith the clenent a, Snereased by unity. 
Befere comeidering the marginal and conditional dloteitutions of 


gutmatrices of @ , it is neoassary te define the nonstandartiaed matrix 
beta disteituticn, Let 


&® (at, evog a,*%, aos og a.N) (623028) 
be & Kedi.menalonal veoter, where 
te 
> O« ees soag &, (6.3.25) 
*s ded, ccoe We 
Tet € = (ff,] boa k x N randon mtetx with range sot 
% Fs] 
Pa) 
3¢ | G40 Em, o > By : Be 52 = (itl cosy K3 Ag Fhe secy ny. 
a 4 ae 4 


(6.3.26) 
Let, MN © (n4] boa K x unteds of positive closets, Then © a9 cota 
to have the nonstandardise! matrix beta distribution wlth parameter (a, % ) 
ac © nas the joint denalty function 


a —) = 


& 


‘. 
“iy 
Ne) tix) ies 
; =" ( 
s 
_ ss drow ¢ » Dad ore 
be PAS < ~ee =" i ane 
7 
§ = . ae 
8 on fps ot 
. ' fy J _ , 
<! - 
- A , 
t | ae 
rv 46 ‘ 
é 
; oa ow me 
P ; ’* 
. W = 
By nen tome 
gee. v 
. 
‘ 
‘s 
¢ eft , ‘ 
A . “a _ 
= 
= 
¢ ®\ : Y 
ye. oS & 
4 
( bia 
. a . 
we 4 





@{86= 


cA) en i oe i 2) 
fip* G| woh = MIU fp | als ms (6.3.27) 


where p, and gy are generic rows of g ana “M ¢ respectively, and 


ce (of | ars u) is the monstanterdiced mitivariate beta dlotribation 
defined by cquation (6.2.30). 

We now consider the marginal and conditiens] distributions of eny 
exv atnatrix of © wien © has the mateiz beta distribution. tb 
eimplify the notation, assume that the elements of © and 7 have been 
prelabeled 99 thet g = CB 5] and mM = (a, ; (AML, coop KE Hy once WN), 
and thet the subusteix of SiG ine co eerie Gan eee 
(222, coop § SPly cacy Ve there GE {2, oaey KS and ve fly cong Hed t. 


Define the @ x (w+ 1) matriz 
8 6, " Pea ~~ et is ya B,, 
-, C0 (663228) 
See _ => > 
“ea Fev i "es 
an the @ x (v + 1) matriz | 
aw fe ne ~~ " At 
esa ba b bed L 
pS ev Sa oes Py, Leb) ei Pay 
Ve (603029) 
v 
Beg one Few i - Bde) eS Le Bog 
where 
Net 
BO) & ie Peg e 32h, eoog ? (6.3.30) 


eb ceroiiingly; we define the ex (ve 4) parameter mateloss, 


™ ey 








Sev ‘i “a4 ety Me pote "93 
aes (6.323%) 
| 
eoe 2 
oa "ev jevel Pes 
nog 
ev eg aoe Bey Bar 
eo @¢ @ (6.3.32) 
4 aco Bay “eH 


Broome betes Lot the K x N random generalised stochastic matrix & 
have the matrix beta distribution with parameter 2 . Them, for e =1p soe» K 
ard wed, coy H-1, the marginal fednt distribution of By? eeeg Bi 
Bogs eoeg By is natrix beta, 


(Covet) 


Dip, qo sees Poy |) "fe Gey | # (6.3.33) 


ye 
end the conditional Scant distritution of (B40 ryry | Boule given that 
Paruea? °° Py rea? Posueg? °° Feng? = (Pe at cree Pang? 


Po wag? °° Pe et? 4s nonstamiariized matrix beta, 


(e,vet) bi e 
Ape oeop Pow |\%» Py wot eoeg Px tet? es oe (ft * b (v) oH ey’ 
(6.32%) 
Where @ : 
| b () & (Lo Bylwde coop Bo Delve (6.3.33) 


Breck. The theoran follows immediately from Theores 6.2./). upon noting 
that the mateix beta density function is the product ef K maltivariate 
beta density functions. 


ne ng = 


“bel! 


ne J 
ae Oi 
, 1 = 
@ 
te , 
« Wr | Ga 
s¢ 












64 Epyherrled Natural, Coningate BiLatei butions. 
rr. & nas) tho mateharibete tisteind thentasiel agus of) vaketratually 

imdependent renden vactors. The decialermmsker may, however, wish to use 
& pelor distribution whieh adsite nom-gere correlation between the rows of 
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cov (% @x) = | 
sap ig) ganas aig ene 
BC mgr Foy) Bi mgoty 23(myt2yrp) B (mark om, 4B nyt ytg) Bayt 2pm, ) 4+] 
(60l012) | 
and 4% is cemn that there is nom-sero correlation between the soe of P. 
Lot T(E) = ( 7(B)s coon TECE)) be tho steady state probability vector 
corresponding to the Nx stochastic matrix P and let va (ge Bees a 
be a vector of normmegative integers. An extended naturel conjugate 
distribution for P which 4s required for the analysie te be carried out 
4n Chapter 7 ia formed by letting z 
az yy = JU (mys Bed ° 











. 2 Os otherwise (60%-43) 
let lie Ce, 5] be an Nz SN mateiz of positive elements and let a, 
denote the ith row of M (42%, seop Me Then the N = 8 random stochastic | 
matele Fis cold to have the matrix betes! distribution with parameter 
(Hy) io P has the joint probability denaity fmotion 


(3) ROR ¥ 4 
Liga (2 | Mov) = WHev) & pA B fg, 9¢ 114 (2)? : mst Bed 


BO» | othorulge (6.08044) 
where Btu, > 4a defined by emation (6.2.8). ‘the normilising esnetant 
Woy) Ae tho reciprocal of Bf tr, (7 (B))'2] when F has the notetx bota 
Gletedbation uith parancter M, 


yore ‘4 (8,8) ce 
afuliiev) a Ae (1, (2) 6 (P\ Mae. (6.4.25) 
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WiHisv) can be computed using the methods of Section 4.2, tut this requires 
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lengthy calculations. 
Using Lense 4.2.1, 4% is easily seen that 


(8) 


The first two moments of the distribution arc obtained from equations 
(642) ard (6.003), 


a Wis y) : 
& : ees ines hy voogp N (6eto% 
B51 Bexd = — WD LpJOly woop H (60487) 


WH, v) 
8 [Rhy [Me v= ZaBrys a” 2 P (Gatbe28) 
2 mH WT (7 CD05 ¥) 
n a nies GpPeyeS @ iy econ N 
where My re) es My and t ® is the matriz ¥ with Ate (499)th elenient 
inereased by unity. 


Dae te the complex ealaulations required to obtain the normalising 
constant (lov), the matrix betas! distribution is presently of Mmited 
usefulness. This distribution is, however, of eome importance since 4% 
4s the natural conjugate distribution for one ef the data-generating 
processes to be considers’ in Chapter Ze 


6.5 She Rotethttle Deteiintion. 
The betasthittle distribution 4s defined to bo the unconditional 
Giotritution of the transition count F of a Narkov chain with transition 
probability mstrix P wideh ie drawn from a matrix beta dieteiiation. ‘the 
betarthittle-2 distri imtion ie defined 4n an analogous fashion. In tis 
esestion explicit probability mass fimetions are derived for thess 
Gistritutions and thedr memente are discussed. 
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$,,Casv9n) % fz lf e Tg Ae a £5 eng f. = ay B oe é. 
Casts cooy WD} (6.5.8) 
ard let 
P (apn) © O)  (asven)- Wik, saep N (60502) 
wand, op 3pece 


P,(ugn) is the set of all possible transition counts, Fy which can arise 
fren a sample ef n consecutive tranaitiens in a Markov chaintdé th initial 
state u and a positive tranaition probability matrix. 

Tho bete-Whittle probabilaty mass fimetion with parencter (arp) &s 
defined as 


ep | uy29h) | Op | vanerre Oe ce | map Fe P (ayn) 
s 0, eleshere (665.3) 

Where UAL, cece Ny Wehy2,3ye00y and Ms Ce, 4! fie an Nx N matric ouch 
that my, > 0 (ApJety soy Bi). 

when F has the bete-Whittle Getedtution with parancter (upmlf) it ds 
clear thst ff mst have the range est Y,,(uyn), since the set of etochastic 
| matelees whlch have one or more slemente equal to sero is a set of measure 
gore relative to the matrix beta diatribation. 

Tt 4s seen from (6.5.3) thet £0 (g | uemyl) > 0. By comparing (6.5.4) 
and (6.3.5), 4¢ 46 aven that P ,(uyn) = > (upnsP), provided F te a 
positive mateix. Sines the set of nonposltive matrices Ls is a set of 
MAIS sero and sinee (ten) is a firdte set, we have 
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2 i, (6450!3) 
Tims, the betasihittle mss fimeticn is a proper probability mase fonction. 


Shepren 6.50% The beta-Whittle mags function with paremeter (upmsii) 
is given by 
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Tl 2 Bf ) 
ro i 9 BCE,» My 
ay E)temi) = FL as Fe & (ayn) 
As = fy (BCL, go ™, 
@ Bs elseuhere (6.525) 


where m, a 5 m4 » B(xsy) is the beta function, and v is the wrique 
solution of equations 
S,. - fa & en Sad Sy" REI coop F 
Pref. Letting i, Genote the ith row of Bs 
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The integrend is the kernel of a matrix beta density function with paraneter 
H+ By hencsy uaing {6.2.l}, 
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QeEDe 


fhe moments of the bete-Whittle Gistetimtion are sogewhat omplicated 
to somate. Referring to equations (6.1.34) and (6.2.27), a£ F = Cz 3) 
hes the bote-Whittle distribution with pareeter (apt), then 


£2, ] es Eo SP p te 4 & = BLE Gy gle Ao fals een 8 (6.528) 
3S 8S & 
antl, similarly, 
Bf, 2 ve 


6,895 H%,,) + = 7 ie a = Se es 


Pv. 


CaP ypBhg coop N (605-92) 
BPLy 


gees 


a ae HE) Sufvvolets 2626 8 (645290) 


In both of these equations £,f+] denotes the expeotetion operator relative 
to the distribution oem | u). fheee expectations osn be evaluated by 
repeated application of Lassa 2.3.2 in a manner wich should, ty now, be 
foriiiar, but the coledlations, particularly 4m (6.5.98), tani to beame 
extendive. Approximations of the cont we have dlemesed 4n Chapter & oan 
also be nade. for anal] values of the parameter n, direst esleulation of 
the moments ie probebly the moet convenient way to eppreach the prebles. 


“7 


, oD im o - 
(ee, BOS | 
A | lle 7 
: 
= ¢ 
ee a 
» Beiiie 4 * > hee 
| . _ 
a ft. . * 
io tm 
sah SPO ED ar a 
. . & et < ’ 
py wa “ on 
“5 4 a? 
“4 
Sy »_ 
aw”va q*34 
( ym 
| * gf ¢* 
hs « 
ate S 
'4* 
~ ry ’ 
LTD Ale ya 
wet —s me ov ‘4 
BT oe iv, 
* * 


, » A : , ew a) 
7 nhey Bl i — 
J WOwssooe GF oi IO 428i 


P. 
“A 
~~” ' ¢ a a - 
¢ ; > Awl 
x - ana ra a “” 
© Csoes? :— 4 


’ i 


e i) «> oe 


wh, 
om 
° an” 
% 2 fl wa 





povoses |* 1.3 anoiaape eased? Be 


* ae dh am > = 64, am P a e * < 
ASN oli (EE LIIIOD avis Seat q 






60502 She Betelithiiieas Visitation. The set of all rocsihle 
transiGion camts & whieh can avise from @ sample of n consecutive 
transitions in a Mase chain with arbitrary initial state ami a positive 


tranaition probability matrix is 
& W : 
nue {n) & U. f,,(t9) eZee (6.3048) 


the fx N randon motuix ¥ with range set # fe) is said to have the stamiard 
beterWhittle-2 distribution with paranster (perk!) 4¢ has the probed lity 
mass funotion 


oF | poral © {* cz | perp) sce) maz (6.5448) 


where Pp (Pyp sees B,) £ Shatin elas sana fo totaal 
irdeperdent of By Uh g2yFpoary aad bs m Coy 4 4g an Nz ge wath 
Hyg > 0 CApjutree Be It ds reaitiy established that 21, (F | Pore}f) > 0 
and that 
| > AY) CF | pemel) © 2. 


Ee a 
>* gn) es fel Es? *y(a), f. 2 £4 (40%, cses nyt (6.5042) 
ars 
P ofa) = GF ta) = $n), (605043) 
Sanoe 


| 8 
Og { Pata) a = pf (F | tprpP). 


at follows from Lawn 6.1.5 and the fact that p de fimetiomlly inlepaxtant 
of P that | 
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In (6.5.14), 4¢ Fe # _(n)y (ayy) 4a the unique solution to the equations 


£.. wl ia oe Say° Atl, eoeg 8 


__iaiaelieingealernt eG TRS eee, 
eoours hen p= T(E), the steady-state probability vester corresponiing 
to B. In this instance wo define the nenstaniard beta-thLttle-2 distri ution 
with parameter (nslj) in terms of the following probability sass finetions 
oe 





awne (| tol!) = 2 rae Tetek) “ | dee (6.5033) 


where 1991,2,3, 06 ene eda is acgaa o aee se 
(hg S54, ceog Be The vegtor TT in the Antegrand of (6.5.25) de the 
steady state vector corresporiing to P and te uniquely Gafined for all P 
meet © wot of measure aes Tt ie clear that the range cet of F As 
f° (n) ani that 20°) (p \n,4) 40 0 proper peobebility maes futon. 
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Sheoren 6.5.2 If 


® (= | Tl) ee eB BeBe Wh, coop N (6055016) 
4s the expected value of 72s then the nonstandard beta=Whittle=2 
probability mase fumetion with parameter (ripif) 43 given by 


Ml #, of, 98) 


8 = ie : 
Cepac | nol) = ( oe Pia * OF, ) + & to 
i, gon "gage 
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- ft £, R(t, , a ) a 
3 77 SE ap we. Teea we woe Be not®) 


ol, Asya 


e 0, elsewhere (605087) 
In (6.5.17), 1£ Fe - xo)» (u,v) 2s the uiique solution to the equations 
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we have 
out (Eley = 2 “ ve 1 (Pe ep | tamepree am a IGP. (6-588) 


The kernel of the cure “ (6.5218) 2s 
8 £, ¢8 
Pp " ae 
7? TT, Py. 


Tims, proceeding es in the proef of Theorem 6.5.4, 
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Equation (6.517) follows Prom anes end Lemma GekeSe QeEeDo 


fhe moments of the etanlard bete-whittle-? dietritution can bo ebtained 
from the moments of the betacWhittle distribation by using the relation 


(3) sarong} 
fo (E |Boreg) = 2 Pata, CE | Some (6.56%) 


The mmenta of the nonstandard beteWhittle=2 distritmtion are given by the 
following thoorai. 


Theorem 6.5.3 Let E>] demote the expectation operater relative 
to the ronstaniard bete-tihittle-2 distritution with peresoter (n,21) ani 
E_[°]} demote the expectation operater relative to the matrix beta 
Gistritotion with parameter ie Then 
tf, 3) a mat f gPa3)s Le Gthe coop B (605028) 


BFapt ys! = Saypp fap] + Z a HN By + 5%) 9) 


Gohgy_ Shy Beat 8 (629.222) 
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Cay ba,5lf,g je eerste coon BR (605e22D) 


(6cleH%}), together with the relation 


efe®)) @ Bee | sre }s 


where atF) Le any funetion of F for whieh the expectation exists and Pe Bi 
Ae the expsctation operator relative to the Wiittle=2 distsiiutien with 
parameter (TemeP)e QoBeDe 
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CHAPTER 7 
FIXED SAMPLS SIZE ANALYSES 


dn Chapters 3-5 wo emmined some sequential sampling problens in a 
Mavzev chain with elilternatives. We now consider the prior-posterier and 
prepesterior analyeis of a Markey chain governed by a fixed, bat urienown, 
Nx N matrix of transition protabilities, ®, froa which ¢ fined nanber of 
consecutive ebeervatiens 4a drawn. In Section 7.1 this analyais 4s 

carried out umler the asgumption that the initial etate te know to the 
decieion-maker bafore the eample is ebserved. In Sestion 7.2 St 4s assumed 
that the initial state is unknown and has a distribution which de fmotionally 
independent of P; in the final section it te assumed that the chain tc 
operating in the steady state and that the initisl state is unimmn. 


7el Tutied State nome 

An Mestate Harkew chain can ba considered te be a process whieh 
generates the sequence of random variables, ie Ro naive %e coop Where 
%, 0 {ts v.00 NS 40 tho state of the systen inseilately after tho 4th 
tranaition (201,2,...) and % de the inktiel state obssrved before the 
first transition. This initial state, % = T 4s eubjest te the 
‘etritutien de (Pye naeg P do where p is a stochastic vector and 
p= Pte = 4} (40, ooo, B). Tho transitions of the chain are governed by 
tho N x H stochastic mtrix P= Cp, Je where n, , Ae tale 283 
(Ap jek, coop Nz MPO,%92,0e.). It As agsumed in this section that the 
intial state is knmm te the denielonm-maker. Thus, in this cage, 
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By = Bar Sai, e6ep R (Feded) 





eateries Anpiveds. let 3, = (ae esey x) be a seme 
of m consecutive transitions in a Markov chain, where % = u is assumed 
knew to the decislonemaker befere the sample is obtained. ‘Ths, x. 4 
ebtained under the consecutive sampling rule. let Fo (z, 3 aide 
tranal tian count of = BOGS ee, given Pe P, 
ef observing the sample x, is 
Poa a eee sis Tl ms £44 5 (7ohe2) 
If the stopping prosess is noninformative, then (7.1.2) is the kermel of 
the Wikelihood of the same. It 4s clear that the statistic F ocnveya 
all the information of the sample ami that, if stepping is noninformativoe, 
EF is a sufficient statistic. 
When the transition probabllity mateix 4s regarded as a random materia 
B, the natural. conjugate of (7.1.2) 9 the matrix beta distritution defined 


by equation (6.34) with Ry s 2 (m3, e208 B)> 


el » TT 4, is (744.3) 
fat 1 . 

If Phas the matrix beta distribution with parameter M° « Cag ,] and a¢ 

@ sample from the prooess yields a sufficient statistic Fo Theoren 2e2eh 

gshews that the poaterloer distribution of Fis mateis beta with parameter 


ue a Hie e Be C7 eLatt) 
Velie? Soupline Bistetiatiens ead Brenss Lalor fsalysia. It is asamed 


chet x ou ds imow and thet n, the mmber of transitions to be ebservad, 
26 Getermined before the sample is obtained. Prior te sampling, the 
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oie 
trans. tien count F fe & Fandom matwie and the conditional prebabilit, 
given P= P, that the Markov chain will generate e specific sample g 
uhioh hes the transition count F ts given by (7.4.2). Whittle [44] has shown 
that the nomber of samples of cise n with x, & u which have the tranaition 
count, F is given by 


e (Fe 4.5) 





. N 
whore £, ad pA £34 (42%, eceg N}» vis the final state of the sample, and 
Fs 4s tho (vou)th cofactor of the matrix r defined by equation (6.4.8). 

Tims, the conditicnal. probehility of F 4s given ty the whittle preted Mty 


mags function defined ty (60467), 
P CF \ wemeP] a 2") ¢p | weg?) (70406) 


If a saaple of n consecutive transitions ia ebteined from a Markov 
chain with loxen initial state u end 4f the trensition matric 7 has the 
satel beta distritution with parencter HY, then the uncomlitdonal 
Hateibution ef the transition count Fas 

: | | (f) (Hei) 
DF | uarapti*) Pa ye 7 (F | wemePIE (B | Hoare : (Feke?) 
A 
It 4e soon from equation (6.5.3) that the unconditional Gctritution of 
F as the beta-Whittle mass function given ty (6.5-5)¢ 


DEE | uarekit) = £0 ¢E | uanek?) (704.8) 


If the prior distelbation of F ie matrix beta with poraneter M’ ond 
4? a sample of sise n yiclds a oofficient statlatic Fe then equations (7 of.) 





oZe 
amd (6.307) shew that the mean of the posterior Gistributien is 


Eo (5 J (70449) 
Tie, &f a e ¢ My ge 
ma? + 
Pr e 4) it. Lp J™hy coop N (704040) 
3 ow +2 
Le ae 


Before observing the sample, B° 46 @ random mitrix which can take once of 
& finite sot of values in the range set R(upmpll?). Let ; 


S(B) © 8 | Be Feftte)s =e pt PeR(ugre*) (702-84) 


be the set of mosakble trencitien osunts whieh result in a posterior moan 
with the value PeR(ugnyff*). ‘Then, ty (7.1.9), the distritution of the 
posterior mean 4s given by the following prebatility mass fumotion, 


i 
Mf = gloomy ® on toi | 20a8 Bertugnsl 


ma O. elacuhare (7? hel2) 


72 Fatiel State Uoknam., 

Ve now assume that the inktiel state, % = W% is urleomm to the 
Gecidien=naker befere the sammile 4s observed, but thet T has a probability 
Giatriation, p= (pip -++p Bde witch te functionally Independent, of P 
and whieh may or may not be lenown te the decision-maker. ig Pie urissm » 
4% 4s cleo assumed that the utlidty of any terminal deolsion made after 
%, is observed depenis only on P and not on §. 
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let zB & Gh oseg x) be @ 
aemmle of mn consesutive trangitions in a Merkev chain. lat us 2, be the 
initial state observed and let F = [f, 3 be the transition eount of the 
SANT. than tha contitiena protebthity, etvn F= P ant $= Pp of 
observing the sample # is 
nO 

Pe ees ae 2, J Bays, (7.2%). 
If the stopping process in voninfemative, then, sincs terminal utilities 
depand only on F and not on 8, tho kernal of the Lkelihced of the sample 
4s 

bis i my 3d aa (7.202) 

and F is 9 morginally eafMiekent etatictio. 

When the mteiz of trensition probebii4ties is treated as a ranion 
matrix F, the naturel conjugate of (7.2.2) is the mtriz beta Getrébaution 
defined ty (6.3.4) with K, = 2% (imkp «oes H)e If P has the eatsia beta 
eee ion stith: parentoten, £9. toys arrl 42 a sample fram the process 
yaelds 4 marginally sufficient statistic F, than the posterior Aistediution 
of P is matrix beta with pareneter 

ie 2 ie + Be (7s203) 


7.202 Sommiine Bit 
assumed that n, the nunber of transitions to be obsarved, 1s deteraine: 
before the sanplle ig obtained. Peter to sampling, the pair (Hf) 49 a 
rendon quantity and the conditional probability, given Fe P and f= pp 
that the Markov chain will generate a specific sample g with the 








fh 








2b 
statistic (uf) is given ty (7.2.4). the mumber of samples of clas n 
with initial state u which have the transition eount F 4a given by (7.15). 
Therefore, the conditional probability of (8F) ie given by the Winittle=1 
probability mass function defined by equation (6.4.37), 


PlasE | pomeP] = 6 aay | BetisE)- (7a2eir) 


The aonditionsl disteitution ef the marginally sufficient statistic F is 
the Whittle=2 prebabity mass fmotion given by equation (6.1.65), 
PCF | pore} © 20 ¢g | pyneP). (72245) 


“Ef a emnple of n conscoutive trensitions 42 obtained fom 4 Harte 
chain tere the distritution of the initial etate La imow te be p and 
where the trangition probability matrix P has the matrix bets distritution 
with paraneter N*, thon, provided p 4s functionally Andepentent of P, the 
tmeonditional distritation of the transition count F is 


DE | pm?) = oh) HO 'CE | para) on CB | yoda. (7.246) 
fims, by equation (6.5.14), the moomtional dhotsivation ef F 4s the 
betethittle=-2 probability mass fimetion given by (6.5.18), 

OF | Potioli®) 3 1 OE | Botallt)- (Fo2e?) 
ft? B 4s urkrewn and has the pelor Gistelbutden fometion Rp Ts wh th 
uneomiitional distelbation of F 4s ales beta-thbettle-2, 


DCE | Yoru?) ite SFE | poral dan(p 1) 


ed “a (Bl C1) omer). (722.8) 
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5) Gem 

re P haa the matrix beta distribution with prior parameter K° ex) 4¢ 
@ semplo yLelds the marginally efficient statistic Fo the mann of the 
posterior Metritation of F is given by equations (701-9) andi (7.1.40). 
eae *> cbrerving She seetiia|the aeetertor seme 12 ele nea nice 
with the finite range set R (my8°). Let 

3 *(e) c iz lz Fe io (n), Pes r} BeR'(my¥*) (742.9) 

be the set of possible transition saiiina: shiek dace ee 
BY = PeR (ngii*). ‘Thon, from (742.7), wo find that, 4¢ pis know, the 
distribution of the posterior mean ie given by the following probability 
Psat 5) al 


Pcp = P| Pattelft I a 2. «) oF) awe “EP Petigli®) PeR (ea?) 
(2) 


@ Oe elgavhers (7.2.40) 
Srileriy, 4f 5 is urimew ari hes the perfor distributlien function Alp IT). 
the. Asteibation of the posterior mean is 


PC = 2 |Yon.ut] © ps ae 2) was 6B) BCL omy) PeR (ngit*) 


@ O. elsahare (72042) 


703 gesten Onemating im She Steady, states 

When the Harkey chain is operating 4n the steady state and the Initial 
atate, &, de unknown, the disteitution of & ta TAP) = (77GB), coos wzCBP)s 
the steady-state probabliity veoter aseéclated wlth the tranditicn matrix 
B In this case, observation of ¥ provides information ahout P. 


73h bal Saban eior fine israel. Lot Zz @ (zs seep & ) be G sommille 
priate For 7 ba ass'td A Weel Aled Wher PCR ee 
7 





os 


or 


f 








ofhGe 
steady state. if us %, ds the initial state end f= (2, 4] is the teanci tion 
count of the compile, the conditional probability, given that f= Bs of 
obeceving the sample x, is 


BOR 

Try) Paya Pa es TOD TY Merge 5. 
When stopping is noninformtive, equatien(7.3-2) is the kernel ef tho 
Likelihood of the sampile and the orderad pair(a,F) 46 a eafficlent statiatic. 

then £, tho matrix of tranahtion probebilities, 4s regarded as a 
vandon matrix the netural conjugate of aquatiion (7.3.4) 4a the natriz betaei 
Aletetbation dofined ty (6.41%), ©) (P) yaw). Tt do cantly eoen thot, 
Af F hos the oateix betert pe with parameter (Hs y*) and af a 
eample from the process yields a ouffidlent statistic (uf), tho posterior 
disteitution of Fie matrix bota-% with parameter (N,v), where, 4? 
g,, 4¢ an Tedimensioral. sew veoter wth uth eomponcet equal to one aril ald 
other components equal te ser, 
1 oH? + F (703-22) 


— £3 ¥° & Sy (7 2392b) 


As was noted dn Seation 6.4, the normalising constant ami the moments 
of the matrix bete=i Usteiintion are diffieslt t commute. This dif mfantlty 
cuplicates the task of asslerning a specifiie mstzis betee] neler 
Asteitation to B. Since the satrix bete detritution 4s also a matrix 
- betael distribution with the parancter v 3 (0, oe, 0) = Oy, 





cM rl | gat (Pl 0d» (7-363) 


4% may ba converlerrt for the dedialen-maker te use a matrhe beta prior 
disteibution for P ani we shell ssaume thie to bo the csge in Gisqussis 
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aie 
tae prepostemior analyais ef a Marker chain operating in the steady state. 


7302 Seouljoe Batribations aad Eeenostevior Jualzsis. We assume 
that m, the munber of trensitéons te be observed, is Mimad in advance of 
sampling ari that the prior disteitation of Fis matrix beta. Prior to 
camping, the conditions! mwebshility, given Pa P, of obtaining a 
apeckfic comple 3 with the statistic (ug) de given by (7.3.4). ‘the 
tmber of samples of sise n with initial stete u which hove the transition 
comt F ie given by (7.125) anil, therefore, tho conditdenal probatility 
of the statistic (H,F) 4s given ty the Uhittle-2 probability mass function 
ag defined ty (60137)s 

Pog | myp} = 2°) cugP | ICP) mee)» (7-368) 
the marginal. conditional dleteibution of & ie Ip) ani the marginal 
contitional dictritatten of F Ag the Whittle? distribution, 

20) (p| W(P)amB)- 

When a sample 3, 4s obtained from a Markey chain operating in the 
steady state there the initial state fe ummm ani the transition 
probatiMty matrix F has the matriz beta Gstribation with paranotor i", 
the unconditiom2 distedtation of the transition cunt F as 


DKF | ral?) = | 2 e| Kemp ee (weap. (72365) 


Therefore, ucing aguitins (6.5025), the unoniitional detribation of F 
de nonstandard betedhittle=2, aq given by equation (6.5.37)s 
HE |ngye) = £00 CE | mys) (7.3.6) 


Zt ds then costly coon that, 4¢ the eet 3°(P) is defined ty equation 
(7.209)_ the orior distelintion of the posterior mean is given by the 


cay 


te 
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following probability masa function, 


pd? = P| nM?) = va , 
7 8 Sete ee). 
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CHAPTER 8 
SPECIFIC RESULTS FOR A 
TWO=STATE MARKOV CHAIN 


Many of the matters considered in previeus chapters are specialised 
te the caso of a tusestate Morkov chain in this chapter. The 2 x 2 
transition probability mtrix Fis assumed te have the matrix beta 
distribution and explicit feormias are found for the means and preduct 
moments of the n=step transition probabilities, the steady-state 
probabhiities, the precese gain, and the expected total discounted rewards. 
The chapter concludes with a result coneerning the selection of an 
optinal terminal poliay for a tem-state process sith a special type of 
veward structure. Most ef the formas derived hero are double infinite 
series; it appeare doubtful thet similer expressions can be obtained 
for cheins with more than tw states. 


Sei Praliminaries. 
Let 
P@ Lex & O<xysi (8.2.8) 


be the transition probatdiity matrix for a twoestate Markev chain. The 
eigenvalues of P are the roots of the equation 

Jaze Pl = A” @ (Aesmy)h + (omey) 2 0 (8.4.2) 
and are found to be 
A, @ i O<x,y<i (8.2.32) 


a 
A, @ omy, O<mysi (8.1.5) 


=21Q—= 
The eigenvalues of Pp are distinet provided x and y are ret both equal te 
sero. When Ay F Xe Sylvester*s Theorem leads to the spectral decomposition 


Ps + (fo ee sae ‘ 8.4% 

ze “yy (femey) = — ( ) 
y £ oy y xf Oory #0 
mty wey wy By 


Equation (8.1.4) dsmediately gives the following expresaiens for the steady~ 
state veotor, 





e 4s y R 8.1. 
me = : —=--| : (8.45) 
| xf0 oF yFD 
and the nestep probability matrix, 
A ( (a) 2 pa <n * a : = 
E rit ie ae “a | * ee a 
 W y = = : 
Pp Dp onuacmamn CMTS 
24 22 wey ary a tid 
pA sit say (8.2.6) 
In particular, ie ot a 
@ sy 
pi) = Be (1 - (emy)) © ate (lesey) 
eee. origi (uta 
ie ‘7 e008 adhered 
=. EVs s wf oF yf 
en, einilerly, 


pf) = 7 e (tomy). Peig2e3zeee (862.8) 
is bon) mf or y¥0 


Let the process have the reward matriz 


R & te = 18 b (8.2.9) 
e d 
where PF. is the reward earned whem the process makes a transition from 


43 


state 1 to state j fo sot o> }. 








Let 


He ay rh (8.2.90) 
St 
Pp q 


and assume that P has the matrix beta distribution with parameter M, 


a er oe | 
B(man)B( p,q) & (tox) ¥ (ley) e (8.4.48) 


(2,2) 
fo (2\y= 
Tims, % anit ¥ are independent random variables, each having the univariate 
beta distritution. It 4s to be noted that equations(8.1.7) and (8.1.8) are 
valid for all ged, exeept a set of measure sere relative te the matrix 


beta distribution. 


8.2 Hypereoometric Costfislentas 
Xt will be convenient te use the hypergeometsria coofficlent, (Zz), 


in the equations of subsequent sections. This eseffiacient is defined here 
and some of ite properties are derived. 

Let =x be any real mumber and k eny nonnegative integer. The hypere 
geometric coefficient ie dofined oy 


(x), 3 e(sti) .2.(wkei), | KOA e2,000 (8.204a) 
si, el) (8.2.4b) 
ff x> 0 it is clear that 
(x), © iim (8.2.2) 
T (x) 
and in the case x = 1, 
(4), = kh (8.2.3) 


lemme 8.2.4 If (30), is the hypergeonctric coefficient defined ty 
(8.2.4), then the following relations held, 
z(set2), s (x), (ste) (8.2 el) 


*. 





@2i2= 


a(n) s (=) od (80205) 
(x), Ps (32), (atic) (8.26) 
stark) Pets (xe), (xottc) (aeoter'd) (8.2.7) 
(2),-(se0%) = 4 (8.2.8) 


Brog. Equetions (8.2.8) and (8.2.5) follow by writing 
meet) = mmr) oo-(sce) 3 (x), (aie = (oa? (8.2.9) 


(8.2.7) We USE (8.2.6) and (8.2.4) te obtain 
x(zti) 2 xe(r$), Caevter’d) 


3 (x), (cate) Carlen) « (8.2.10) 
Equation (8.2.8) follews by direst expancion, 
(x), (aot) (seh) oo 0fmtzed) (mec) oo ( eevee) 


s (=) ay” (8.2.48) 





We fixe’ consider the men Using the binontal 


theoren to expand the factor (tomy) of ae (8.1.7), 


k 
Cemy)* = = CS yay” y” (ey) (8.304) 


we eam urite 











Mal, 
ag? j= fs E (tomy) o\7?)(p | map 
P hag 
Mad te 
e‘s 5 (¥ yet) EOMEteeRm). (8.3.2) 
ee) 0 aR 


For a = Oplg2ycce 


1 
4 
B[M1n%)"] © se { gittt)ni es .yatont a, 


Blan) 
B(mta, moi) ) (m) 
and : ‘ 4 
Foes (praja’ qui 
BLP] = Soe) oY (tay)" dy 
SB B( peas) = Pg e (8.30%) 
Bl pea) (pra), 
Tmse we have 
| Mot ¢ (p) 
i ie ee a a mew Py - (8.365) 
keQ wn Carnet), (pra) 
[1525 3¢000 


fhe following recurrence relation, which follows Semediately from (8.%-5)» 
is ef use for compiting successive values of sree, Je 


(m) (p) 
He 1s tage] +P Ee AMD aa adn sss 
(armed), (Dra) 


Va 15253, 0060 (8.2.6) 
In a similar fashien an axpresaion fer ep? 4s easily derived, 
using (8.4.8). 


eae ys Yn raved auihe 
BB DEE ECS yet)” w cytety a cet 


.« 
5 





a2 lin 


wilh > 
a's aon H y(t)” and ae (6.3.7) 
toon). (pra). 
c [AB g2scee 
For purposes of commutation, we have the reourrence relation 
(a), () 

apy ow aoe ae yen? Ace vet c0,3.8) 

ya) (peg 


me oy re ae 
FP MAetpces 
The derivation of (8.3.5) and (8.3.7) depended upon the fora of the 
exprescions (8.1.7) and (8.1.8). Similer emressions canmet be obteined 
for ed and pi), Tmay the diagonal elements of the mean n=step 
cs probability matrix mist be computed from the ro'ations 


ate) = sates?) | (823496) 
amd 
ae) = wm. (9.3.90) 


We naw verify that ete ] eathefles the recursive equation (4.4.2). 
Tast 2e, wo shall shew that oo 2 an ] eatisties 


2 
BS Map] Bee) BAM. (8.3410) 


where D, (H) te the expected vase of 0) 2 ees (os ee See 
Ao and ubere , 4(i) 4s the parancter matria M wth Ste (2,J)t 


ij = 
Slement inoreased by unity. 
. Shnew 
Pp) = ae (8.3.28) 
Pop) = see (8.3.$4b) 


the right side of (8.3.20) is 








Botte _ntt SS Cy ed)” rae Dy “ 
( ney 
sat toy PD), 


4) (a) yiP) 


ke 
£ 
Pg em inf ved (meet), (Pratl) | 


« (8.3.42) 


Using (8.2.4), equation (8.3.41) can be written 


fot & (p) 
Silas ec et) Brey ie 
ee nt 8) (aimed), (pra), PiGhy = BMAF EHeow 


(8.3.23) 
Since - 
q am med B mtigony > pre (8.303%), 
prqry mart iekey mintitkey prqtv 
(8.3.43) becomes, upon applying (8.2.6), 
Pot et PL, 
—— E £ Ee ee 


(ome) (pra) ¢ 


gow A 
ot k 
ge z Oyen” Peviev (Py | : (8.3045) 
a keto ey 


letting J ¢ vil in the first am ond mting that (") + G) = (4) 2 
wo obtain 


n Hel c wees EK teeg (p) 
va [ wo Germs yea Sv e” tehertev!Py 


(avert), PhD) 


eet = Co) 
+ (wf) 2} (8.3086) 
. {pq ‘ 


ined 


which, upon letting j @ k+i aml collesting terms, is aap, as 
required. A elmer derivation shows that B47(W), as defined ty (8.3.7)s 





eatisiies (4.2) 2 


eg shad) 
Bt Expected Value of Pop ys. 
Ueing (8.4.7). BO NAV@, for fixed Ps 


fol pot 
ah)? = x a 





3 = (leomy, (8.4.4) 
1 kd 
amd, therefore, 
yee, ne rok ae eet nie vite 
eee t 2 2 Ge Bo BLE J. (8.i4.2) 
r B(mtc,n02) (n) (n) 
E a tn y &3 2 bad Bib. 
a (40%) ") Stan) 5 Ga, 7) : (8.4.3) 
CEO si s2pces 
we have 


wl pot 
wt egty2 ee AS dey Ogee! Pe 
= (ain), $0 me m0 (mint?) 5, PD, 


¥ 
Meise, eve (B2%—%) 
Similarly, ty equation (8.1.8). 


a 


ZS (lomy) (8.4.5) 


ce)? oe os 
oo 


and 
i pol (a) (p) 
sccoe)?3 .e z oy ayy eee. (Balte) 
(1D) soy PPD 0 
PPig2yeee 
Finally, since 





pod fol 
pe sy fc ¢ (tomy), (S.t.77) 
$n kes 
we have 
ata s)) 2 att. a oe “¥ (Sy ¢ ay” (2) stay?) wag : 
0 =O wd (mime) (neq) 
Stlcoy wed 


FOigegeve (8.4.8) 
The same methed can be used te derive more general product moments of 
the form atatadse) % 


8.5 StesiueState Pmbsbilitien. 
We now obtain expresecions for the means and preduct macnts of 
EE) = ( 7). SLiver [38], treating the special case where y is 


known ami ¥ has the beta distrilmtion, has show that ef Tr q] 49 8 Gaussian 
® 


hypergeonetric function. 
By Theoren 4.2.5, 8, BtBV=)] © BC #7.) and using equations (8-3-7) 
ard (3.3.5)> we imnetiately have 


com Me 
a 7,1 2# fd ¢£ (Ey (oa)” Poo vos (8.524) 
or ¢ gery) pra vet 
co ¥ : 
ECW] = eo Ryton)” easy . (8.52) 
k=O wed Cornet) (pia) 


fheoren 4.2.5 implies that the series (8.5.4) and (8.3.2) both convarce. 


dS ait ae Hoglecting the constant 
maltiplier -_ , ani noting thet (* yo¢s ye the series of absolute 
men 


values correspording to (8.5.2) is 


ee a 
Erdelyt. [17], oh. 2. 


We 
te 





me | 
agen 
Do 34 2 
. 
Our . 
- 
rh 
: _ 





= 3 (Ky (a), y'P), ot te os (a), (P) 
9, : y (memei), (pea) 
keQ <a) mrdsrd oyna a ve) kad (aime), (pra) 
oo co Fh) (a) (pd, + us 
a = 
eS FB Cisinypsmintt Pras 1,1). (8.523) 


there F(asPp8"syay"s Zoey) Ls Appell’s second hypergeometric funstien of 
two arguments [2]. Sinee F(cpBeB* avey's Ray) diverges whenever |x| + ly| > de 
the series (8.5.3) diverges and, therefore, the series (5.5.2) eonverges 
condi tionally. A similar prwof establishes the oomiitienal convergence of 
(8.501). 

It 49 eashly verified thet Ef 7 1 and J satishy equation (4.2.'0a), 
Let 70) © af 77) one Then 4 mst be cham that mo satisfies 


74) & is W 1c eg (H)) By §2152 . (8.5.8) 
es cuistAme tte cane dots ee paren anaaVaTL: 
For j = 2p the right side ef yor 5 4s 


‘ (a), <p) 
=, Eyes” i a es — 7 
fae. wed i emanate ahd 
( ®) {p) , 
g ae 
‘ fa te)” { wet v e (8.5e5} 
aan eo 5 co) ¢ Tam Tes , OY i 
By using (8.2.6) we see that 
rg + af), (p) (peqey) te (p),, : (8.6.6) 


ena, thorefore, that (8.5.5) 4e aqual to 7.(¥). 


< 





Any 


aZhQe 
By Theoren 4.2.8, te atts o B®, ¥,] axel we obtain the 
following equations fron (Belted) » {Belbold)5 amd (SelboS) 5 


ae ye SF ae (FE) G04)” () setaoy’ vee (8.5.7) 
4° go wd wo a ee 
ag *"] = Me se 8 en” ts (8.5.8) 
@ (aia), xO kad 0 ps Ce) On 
0° (m) (p) 
: i” RM geQ ed ved ned Ae 


She castes (8.5.7) = (8.5.9) are comlitionally eonvergent. We 
4lingtrate the proof for eqnation (8.5.8) « By Theoren e208 the double 
infinite series (8.5.8) amverges. Neglecting the cmstant mitipiier 
(nm) 
faa the correspomiine series ef absolute values is 

2 
tents). (pra) 
0 ed wed ferris Hew pra . 


Using (6.5.3) we can write equation (6.5.10) as 


° (8.5089) 


oo (a), <P) 
F(donspamine2, pias tei) + BF “s (Ty Stee ey (8.5023) 
Sei bed yd (mime) (pig) 


which diverges. Time (8.5.8) 4s conditionally convengente Siuilar proofs 


chow that (8.5.7) and (8.5.9) aleo converge conlitionally. 


It can be verified that Af 7, Tr, satiefies equation (4.2.88). ‘The 
Algebra Le otreightformead but tedious und will not be reproduced here. 





= Deo 
6.6 Soness Gains 
the axpested gain of the tm-state Markov chein conaidered in this 
chapter is, ty (4.4.3), 


2 2 
e(H) © Ji = 76%, 00) BD (8.6.4) 
If the reuard matrix R is given ty (8.2.9), the expected gain is 
= co 6k (m4), o(P) 
ame 5 5 (K)(Wi)’ Ca Be 
= tx und (monet) PDs 
een (nent ae unt an  prg Caraed) Crary, 


applying (8.2.4) » (8.2.5) an® (8.2.6), wo have 


Kk 
= yteny” ew Pvt 8 (onmuyrtror So. 
0 $ 


mint ne PHD wg 
{8.623) 


It ig clear thet (8.6.3) must converge conditlonally, 


5.7 fotel Discounted Resaxi Yeator. 
Tho exmected Glecoanted reamed ever an infinite period when the aysten 
starts in state 4 4s given by equation (4.3.7) as 


2 2 
V.moe f BH m(H) ° 
Ve (4) peo B es —e Pag (To (B)) Pa (H) Bagee mate (Be7 ei) 
Let 


aes may) ae 
gr anage DB” Bis D> —-Bnde2e (86742) 











ee Ze hoo 
Ror (3,4) a (1,2), We have, ty equation (8.305) 








oh 
w= Ze t’s s (8) (=3)” MP 
Awt wen kd (mntt), (ra, 
ie) ko 
ped ied wpa mend), (Pra), 
(8.703) 
(m),_¢ hy 
p 
Cy (01)” aT o |e Catwig)"]| <1, (8.7.4) 
. :,- kev v 
=e (a), (p) 
a), (p 
fo. SF eis gen” _ ev 
min Mad (arned), PMD, 





<p rs c av" s B = (p4*i) ee (8.75) 
pe led pe 


The ratic test shows that z (+i) 6“ converges, hense, wo may inter 
P 
change the firct to summticn operaters in (8.7.3) t obtain 


: co 4 (a) (p) 
5,2) __ Ba aa ° ie (+%)” tewy e (5.7.6) 
ten" © (lep)Grim) Ios es) YY e" Gaia), (pea) g 


Heglecting the constant miktipiies, the series of eheointe wilnes 
compespomding te (8.7.6) ta, upon interchanging the order of summation, 
7 Eee ge 
0 IO tv (minal), (peg), 


a F(Letteps mithle pqs By Bd» (8.7 °F) 


‘where Fo(as8e8"syev’s Ze7) de Appell's sevond hypergesmetehe function of 





a 











aPOP 
two variables [?]. Appell has shown that the series (3.7.7) eotverges if 
0<8<5 and diverges 4f dee B<i. The ease 8B = S has not yet been 
investigated. We conelude thet (8.7.6) converges absolutely for 0< s<4 
and, since Theorem 4.3.2 implies the convergence of (3.7.6), that the 
series converges conditionally for $<p< i, 


For (4,3) = (2.1) we use equatiun (8.3.7) to obtain 





k (m2), (p) 
sets ZF zc atten hy vt (8.7.8) 
k=Q un min), (pra os 


the series converging absolutely for 0< a<4 and conditionally fer $a <i. 
The rensining cases are 


so pra. A ag = ed 
Sy) = FOC BOD = eg 8, 0(09 (8.7.9) 
ard 
Spall) = Ta - (0). (8.7.10) 


The expected Giscounted reward starting froa state i is 
« e 2a P22 
v(m) = in Py) Fy, + a hah By 475 (8D) By (i) a (8.7048) 
Using the reward matrix (8.1.9) and the formulas (8.7.6) end (8.7.9), 
we obtain, upon collecting terns, 
ma + nb n@ oo 


Vv.) = 


eatmeneienel + cmommmmmene §, : cK a (-1)” 
1s (ieB) (nen) (48) (men) eee 


(m), y), e(pty) + dq a(intkeu) + b(net) 
x x ee ee ee « (8.7012) 
(mined) (pra), prqte mine Lek 


In a similar manner wo find 





=223= 


me op + dq i a ae (a), (p) 
(298) (prq) 1-8 ke ved (men), (Pt) 


(8.713) 








atmtk-v) + tn o(prved) + dq 7] 

x ~ e 
mrtirticoy prqevel 

It can be shown that V4) and Vo(H) satisfy equation (4.3.9). 


8.8 & Generalization of s Result of Shor. 

N. Ze. Shor [37] has considered a game-theoretie wodel of a tww-state 
Markov chain with elternatives and rewards. He shows that uner certain 
circumstances each player should act so as to maximize his expeeted one-step 
transition reward. This result 19 generalized here. 

Consider a twoestate Markov chein with K, alternatives in state 
i(4=1,2). Assume that the rewards depend only on the initial and final 
states 4 and j and not on the alternative used in making a transition 


from i to j. Assume further that the reward matrix is 


R= tr * (8.8.1) 
where, if r 4s any real number, 
| rhe a3; URL, cacy Ky {8.8.2a) 
ss ert, katy cons Ky (8.8.2b) 
rss a r?@ Kel, coos K, (8.8.2c) 
wot rte + Ay Kel, coay Ky (8.8.2) 


We require thet 020 and A, 2A,= 0. 
Let ¢ “tb, 5) be the matrix of alternative transition probabil tics 
and let =e have the prier distribution function aelr). if (2 ly) 


if, 


a4) 





2 Fife 
is the marginal distritation function of Ro), 4% 4s assumed that, for 
all Tek, Fy (2 iY) 23 contémeus on the boundary of Af ye 
The expected gain of the systen uniter the poliey T is e(S, 1). 
Suppose it is desired te choose a policy, Cae which maximises the 
expected gains 
aC sie Se Lee 4}. (8.8.3) 


We shall show that, with the reward structure (8.8.2), it Is sufficient te 
solve the corresponding deterministic problen for @ (+) = ae\ +3 

and that the optimal policy, © = (o*, 0%)» As determined by the 
equations : | 
: Pt) = | mee crt 4=1,2, (8.8.4) 
Pye kei,..K, . Pao ; “Fo a 


Let a” eae) be the conditional total expected reward in n 
transitions under the policy © when the system starts from state 4 (41,2) 
ena P(r) =P. Let Wx, 1) be the corresponding unconditional 
expected reward, 


ae) & : afr Pare (B\ 1). (8.8. 5} 
A 4=1,2 
HBL, 2,200 
Se 


| Lemme, 8.8.1 For nel,2,.0. and all o-e%, 
amr) - UP (at<a . (8.8.6) 


Froof, We chow that, fer all ped, Py 
: a) ¢0-,p) = of) (ep) A 43 11,25 000 (8.8.7) 
4 = 2 CE 


from chich equation (8.8.6) follews, sines v aa by is a set of 


measure sero relative to F, (P\1). 


i] 
iW 
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Let f jouer) be the expected number of transitions from state 1 to 


state j inn transitions when the system starts from state u. Then, for 
bs} 
ali ped, ; 


cc, P) o m+ p C2, (ton) + £,o(en)] a AF, (ion) +A,2, (Lor). 


421,2 (8.8.8) 
F815 25 000 
o 

re P is represented as in (8.4.2) and the eigenvalues of P are A ai and 


A Showy we can use the spectral representation of P given by (8.1.4) 
tegether with the expression for £ go2en) given by equation (6.1.25) to 
obtain 


2<,," 
ae ,2) - (ep) = 2 Co 04nd, = (1ey)A J. (88.9) 
t bd 2 & 1 he i 2 


Since e2o and A,24, > O» 





te A,” 
a rep) = OC o 42) 2 Be (meget A 
t-a," 
= oh 2 A, 7 (8.8.10) 
a = Ao 
if Oe, <i, then 
4 = Ao" 
whereas, if @1<1, <0, 
4-2," 2 4 
@ pn ea SE = e006 = 
pi Rod, (Ler + ay + +X, ) 
bad e ee » 
4 hod, <A, (8.8.12) 


In either case, wo obtain (8.8.7). QelieDe 


~ vie 


A 
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Lemma 8.8.2 For ini,2 end © et, 


nF eet) = eG). (8.8.13) 
nm 
Broek. Since 
dn) aaten aah Fo eacuerates e —— 


equation (6.1.14) ylelds 
2 ned 


ha e ade) 
(cot) = Ee Yop ey Fs Pop Pee J (8.8.25) 
WROPS 
B, C8 Shee e aL p nar, (PY). (8.2.16) 
Ba 


Let ¢> 0 be given. By a trivial extension of Theoran 4.2.5, there 
exists an integer v> 0 such that, if k > v, 

k) s. 
Eo fF, a J- Be CB, it.1|< 5 (8.8.87) 





Then, for n> VU», 


4 mi 3 
Both tJ- a FB elie me) 














a eC, *1- Bs i + ae 
Ze (8.8218) 
for n sufficiently large. fms, 
Mn 4 = Bp, MO} = Bp lh, ¥,) (8.8.19) 
Reco =f 
and, by (Helelt), 
tin 2 Ee ty) 2 Boe). (8.8.20) 


N-Poo 


id a 
Theorem 8.8.3 let £ = (0, +75) be a policy such that 








we pe 


g 
Bs CEES Lohse aly $ aces : agi? Ce 
Then = 
ao’ )> aoet). re (8.8.22) 
Eroof,. We Mrst establish ty induction that 
a er) A r,4). aoi,2 (8.8.23) 
1,2 ps0 
| get 
For n=i, gq & ae o({Z 
at te eT) re PS (r) 4,2 a ot) 


aft o} o(1) 
es 4) SB r++ Pop (14, = G, (o,%)- (8.8.2!) 
cL 


Agsune (8.8.23) holds for ne For imi, 


= > ae = @ “4 2 
Batt) © Baler) Ces MCCS 497 + BOM) Cea, + WME HD) 
ere Biter ita, + Orit) = a Casey Bash). 


(8.8.25) 
Since 7” 4a an optimal poliay for a transition interval of length n, we 
haves for ail Oo es 
a c~9 gv, a @ 2 
rey) are PX TE A,+ are) = Mr 3+) @ am ea* ty 
(8.8.26) 


em, by (8.8.24) aril Lemma 8.8.1, 
arse) = Hc, +) 


(n) ( 


A) 
DUB) = Baked Ca, + BM h ty = Bee 11 


=O (8.8.27) 
AmMlariy, 





»- 


N 


=2280 


o(rri) | Pa Pe {9 al 7 os) o{n) eer 
Gp (ester so +B Ar) La, + Meet) - Welty Bos), 
(8.8.28) 


and, since A,2A,s 


Bee 53.49 = 2a 


O20) BEC La, + Oct) - Ae e)3 


20, Ce (8.8.29) 
proving the inducticn. 


me dquatiion (8.8.23) and Lemma 8.8.2 together imply that, for all <= cZ, 
| acces Ba 2 rte) 


iv 


in 2 Gee, +) 


n> nn 4 


a(S 1). (8.8.30) 


QeBeDe 





CHAPTER 9 
CONCLUDING REMARKS 


in the foregoing chapters we have deseribed a formal structure for 
certain bread classes ef sequential sampling and fixed sample=-aise decision 
problems in a Markov chain with unknow transition probabilities. Sines 
there is very Mttle theory in this area most of our efferts have been 
divested toward anavering questions of existence and convergence. For 
this reason the portions of thio report which deal. with merical 
computation set forth the obvious, but not necessarily the most efficient, 
ways to approach problems of caleulation. It dees seem clear, however, 
that, for problems with a large number of states in which a hich degrea ef 
acouracy is required, we mist thine in terms of hours, not mimtes, of 
computer time. Thies 1a not to say that the Bayesian methed of dealing 
with Markov chains with unsertein transition probabilities met be abendenc’ 
as impractical. But it met be recognised that, at the present state of 
the art, the Bayesian treatment is prebably most practical fer problems with 
OF ives states, loose prior distritutions, and large differences in 
the reumrds ascoclated with cifferent actions. As problems tend to differ 
frem these criteria, the decision-maker mst balance ineressing computation 
tine against the remiired accuracy of the solution and choose an anpropriata 
approina tion. 

Thore are numerous questions of Immediate interest which remain 
unanswered, some theoretical and come mamerical. Many of these are isted 
below. . 
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i. Certain error bounds were derived 4n Chapters 3, 4, an’! 5 whieh 
depend on the discount faster, B, tut not on +, the paremeter of the prior 
Astritation. These beuris should be made tighter for specific prior 
distritutions by including factors which involve Y. 


2e The rate of convergence ef the suecsessive approxinations nethods 
developed in Chapters 3, 4, and § depemil upon the chalice of temzinal 
fancstions. Classes ef terminal functions which accelerate this senvergence 
rate should be investigated. 


3. the analysis of undiscounted adaptive control models by letting 
s— i in the corresponding disesunted problem may provide a workable 
approach to a difficult matter. The remarks of Section 3.6 are relevant 


in thie somnection. 


k, The question ef the uniqueness of solutions to equations (4.2.40) 
and (4.2.55) is of considerable importance for the calculation of the 
means and product moments of the steady-state veotor JF when a nethod 
ef gaceessive approximations is used. The problem of the convergence of 
the approximant Jo (ny), as defined ty equation (4.2.42) with the 
terminal function (4.2.52), 49 aleo of importence. 


§, In the tereiinal control models of Chapter 5 4¢ is necessary to 
evainate expressions of the fora 


AL at ow a Sacceea} 


Ao. t= oor facet} : 


o 23 ie 
A&% present the only method of finding the maximising pollay a is by direct 
search over the elexzents of 2. More efficient methods of finding coo should 
be investigateds approximations te T * of the sort deseribed in Section 5.4 
should also be studied. 


6. A formal analysis of the undiscounted terminal control models III 
and IV, which were intreduced in Section 5.5, should be carried out. This 
enalysis would exmmine gach questiens as the existence ari uniqueness of 
solutions, the convergence of suceveasive approximation methods, end whether 
& terminal decision point ia reached with probability one in an optinal 
sampling strategy. Im this regard 1% 4s to be noted that equations (5.5.2) 
amd (5.5.3) can be mede more precise ty replacing the ezpression 


eer aoe} 
by the expression . 
max Seyret) + ae. ° 
where w, ( wt, 1) is the expested relative yalue’ of starting the system in 
state 4 and operating 1t indefinitely under the poliey © when the prior 
Glotetbution function of © 4s A(EIt). Methods of computing %(r5 +) 


have net yet been studied. 


7. There are woilleknow diffienlties in easigning a miltivariate 
prior distritmtion to the elements of © an euch @ manner as to accursicly 
reflect the decision-maker®’s state of knowledge. It would bo of considereble 
interest, therefore, to investigate the senslt&vity of some of the models 
of foregoing chapters to ralisatively small changes in the prior distribution. 





* ce. Howard [22], Ch. 4. 
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in addition te these ami ether immediate questions which ariso in 
connection with the research reported in this study, there are sevoral 
fairly obvious directions in which this research can be extended. for 
example, many of the results and techniques developed here should be 
capable of extension te decisien problems in a semleMarkov chain in which 
both the transition probebilities and the parameters ef the holding-tine 
distributions are uncertain. 

Mere general stochastie presesses should be amenable toe fayeslan 
analysie, altheugh differant techniques than those utiliced here may be 
required. The Wellner procees, for example, can be analyzed with the 
existing Bayesian thesry for normal processes, 
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APPENDIX A 
GLOSSARY OF SYMBOLS 


Ueanane, 
Beta function. 


Generalized beta funetion. 
Discount factor. 


Sampling cost when aysten is 
ebvserved in state i. 


Expected one-step sampling ssst 
when orocess ie in state 1 and 
elternative k is to be used. 
Maxima sampling cost. 

Covariance operater. 

Errer of nth successive approximant 
in adaptive and terminal control 
problems. 

Srror of the teruinal decleien J 


Expectation operator. 

Sum of ith row of transition count. 
Sum of ith eccluwm of transition count. 
Number of transitions frem state 4 

to state j in n transitions when 
precess starts from atate u. 

Expected value of cA gion) 


Multivariate beta probability 
density function. 


Nonstemdardised mitivarlate beta 
probabliity density function. 


119 


176 


Svubal, 
(3 
£ 


N) 
pW ( F | pmol) 


‘a e 2 ‘EI Bottstl) 


0 | n.) 


e100 


aa ee 


A” ¢ Pina) 


ri Xe 1 UpMpP) 


eit) (apF | Pols ®) 


( 


ref) 


el» 


Fy (P } 1) or re (+) 


PalaebpB° a¥oY® 8e7) 


7 


at?) 


BY) or g(r 
TT (x) 
ace ly) 


Op | PereP) 


ot) 


Bete-whittle probability mass 
function. 


BetaWhhttle=2 probability 
mass function. 


Neustandard bete=Whittle=-2 
mass functione 


Matrix beta probability density 
funstion. 


Benstandardi.ced matrix beta 
probability density function. 


Mateix beta-i probability densi ty 
funetion 


c 


Whittle probability mass function 
Whittle-1i probshility mase 
funstion. 

Whittle=s2 probability mass 
fanotion 


Transition count. 


Multivariate beta probabil ty 
distribution function. 


Marginal @istribation funetion 
of Se harem c6 8, specified 
Te 


Appell’s second hypergeometric 
fanetion of tam arguments. 


Fanily of prebabllity distribution 
Rumatioens, HP i). 


Expected reward per transition in the 
pe A ag er precess gain, when 
can 

Expected gein under the policy gd 
Gamma funetiion. 


Probability distribution funetien for 
the generalised stochastic matrix ¢ . 
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OH eaiteer) 


He (90067) 


H+) = 
Cy oeses7 (4) 


Tr (n, ¥) = 


(if (me Y de eae ott (ngt)) 


mA) 
P ,fasvette®) 


| Py aor) 


P(ny Ls ? 
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Meaning, 


Family of probebliity cistribution 
funstions, 1¢(P (+). 


likelihood function for the sample Zz . 


AX x generalised stochastic matrix 
(K > Nj. 


An N #2 N atechastic matriz consisting 
of § rawa of © specified by go. 


Transpose of the matriz Pe 
Expected valine of Bese 


ne-step transition probability 
when 2 & Pe 


Expected mestep transition probability 


umier the policy O- 


Steady-state nrebekd lity veotor 

oe an ergedie Markey chain when 
& PP. 

Fd & 


Expected steadyestate probability 
vacter. 


The nth sueceselve approsimation to 
“(T). 

Expected value of “7%, 77. 
Set of afl tranai.tion counts of 
sieo n wittch start in etate u ard 


emi 4n state v when P is the matrix 
ef twaneition pro Reso 


Set of all transition counts ef aise 
nm which start in state u waen P 

4s the matriz ef tranci en 
provabllities. 


Set of all transition counts of 
size n when P is the matrix of 
transition probabilities. 
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He, 


Sat of ali tranaktion counts 

of sise n which start ami end in 
the game state when P is the 
nateix of transition “probabilities. 


Set ef all transition eounte of 
esise n which start and ond in 
different states when P is the 
matrix of transition probabilities. 


Set of all transition counts of aise 
n witleh start in state u and end in 
state v when the matrin of transition 
probabilities 48 positive. 


Set of all transition counts of 

sise n whieh start in state u when the 
matriz of transition probabilities is 
positive. 


Set of all transition eounts of aize 
n when the matrix of transition 
probabliitios is positive. 


Set of all transition counts of aise 
mn whieh start am! eri in the same 
state when the matrix of tranei tion 
probabilities is positive. 


Set of ail traneition counts of aize 
n whieh start and end 4n different 
states when the mtrix of transition 
probabilities is positive. 


Generic symbol for the paramoters of 
a probabhiity distribution function. 


A@etlasable parameter sot. 

True etate of nature. 

Expected one-step transition reward 
when the syatem 49 in state i 

and alternative k is te be used. 
The m-atep transition probability 


under the psliay © when @ is the 
trus state of nature. 


Defined 
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163 


492 


192 


195 


195 


195 
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g = (s, gocees v) 


T apys (ust) 


Yoarkne 
Expected discounted reward in 
n transitiens when the systen 
atarts in state i. 


The K x N matrix of one-step 
traneition rewarda. 


The 8 x N matrix of one-step transition 


rewards consisting of the N rows of 
R specified ty x. 


Maximum element of & . 
Minima element of &. 


The elenent of & with the 
largest absointe value. 


The element of Q with the eallest 
absolute value. 


peat Deand ect at eer 
the nonstardard. 


gea mitivariate 
beta distei bution. 


Range set of a randem matrix with 
the ronstardardiced matrix beta 
distribution. 


Set ef all K x N gemoralised 
stechastie matrices. 


Set of all B 2 N stochastic 
MAtFL GOS» 


Set of all N x N positive stochastic 
matrices. 


Set of all N x N stochastic matrices 
with elements in the close’ interval 
fas ioc}. 


Poliay veotor. 


Repected value of 
fap(aend® (aon) 
Set of ali polley VECtoras Ye 


Defined 
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Defined 
Sumbe esmi.nn Fe ee Ee) 


mt) Parameter of the posterior 42 
is * and a transition from state 1 
to state j under alternative & ie 
observed. 


% At) Paraneter of the posterior 12 


Ty g(tote 1) Parameter of the posterior 13 


systen starts in state i and ie observed in state 5 
after n transitions under the polley J. 


v, ( Y) Expected total discomted reward over 7) 


an infinite peried when the aysten £26 
starts in state 4 and an optimal 1% 

sampling strategy is followed. 
v,(n, 1) The nth successive approximation b6 
4 to y(t) 123 
13% 
vrs vy) Expected total reward over a period 4865 
with terminal operation phese of 446 


¥,( Tos) Expeoted total disesunted reward over a6 
an 4nfanite peried when the systen 
starts in state i with the policy 
in force and an eptinel sampling 
gtratezy is followed (diseuntod 
process with seteup cost). 


v Minima of a set of constan® terntnal Si. 
rewar functions. 

¥ Nextioon ef a set of constant terminal 63, 
reward funotions. 

¥¢1) forminal. reward funetion. LG 


v Round om the terminal resard fumotions. up 
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Syke}, 
V4 (P) 


W204) or Ct) 


Va (nyt) 
var{¢] or var{[*  ] 
—— (eyo oe0%,) 


Zim 
easing 


Expected total discounted reward 
ever en infinite peried when £ = Ps 
the oyster starts from state 

4, and a fisod polfiey is used. 


Expested total discounted roward 
over an infinite peried when the 
eysten starts from state 4 and the 
polfey x is used. 


The nth siecessive approximetion 
te Vg (1) 


The variance operator. 


A sample of n + 1 states cocunied 
by a Markey chain. 
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22 


READ 


ALFA 


ALFA 


GAMMA 


APPENDEX B 


PROGRAM VITERATION TO SOLVE 
EQUATIONS (3.2.1) AND (3.2.2). 


RPROGRAM NAME IS _VITERATION 
RTHIS PROGRAM RECURSIVELY COMPUTES VALUES OF V(Is Tis M) FOR 


RT=lveseeN AND Tl=#SseceslTs FOLLOWING AN OPTIMAL POLICY. THE 
RREWARD MATRIX 1S R AND THE TERMINAL REWARD VECTOR IS RHO. 
RTHE MAXIMIZATION IS OVER THE MUCT) ALTERNATIVES IN STATE fo 
RBETA IS THE DISCOUNT FACTORe A MATRIX BETA PRIOR IS 
RASSUMED. 


PROGRAM COMMON ReRHOsRDIMs IND sMUoN sBETAs INDI sLISTeTOPs 
OINDIMsPOL 

INTEGER INDsMUsNoIND1s TOP oSsT oI eJsK sMAXSP5N1 sPOL »V2 
DIMENSION R¢(SOOeRDIM) sM(500sRDIM) sRHO(10) sIND(LOOsINDIM) » 
OMU(10})sIND1(10) eL IST( 21000) 0V1(10) sV2(10) 

VECTOR VALUES RDIM=3012030 

VECTOR VALUES INDIM=29190 

READ FORMAT INPUT1»® NeSsTeMAXSPsBETA 

PRINT FORMAT OUTIAs BETA 

RDIM(2)=N 

RDIM(3)=N 

INDIM(2)=N 

IND1(1)=0 

IND(191)=0 

THROUGH ALFAl1s FOR K=loloKeEeN 

IND(19K+1)=K*¥N 

NI=N*N 

THROUGH ALFAs FOR K=lolsKeEeMAXSP 

TNDI(CK+2 ) 2K#N 

IND(K4+1901)=K#N1 

THROUGH ALFA» FOR I=loelsIeEeN 

IND(K+1 5141 )=K#NI+I*N 

READ FORMAT INPUT2s MU(1) eeo MUIN) 

PRINT FORMAT OUTIE® (I=LsloleGeNs MU(I)) 

READ FORMAT INPUT39 R( 19191) eee RIMUIN) sNoN)o M(Lols1) 
Oceos MIMU(N) »NoN) 2 RHO(L) eee RHOIN) 

PRINT FORMAT OUTIDs (T=lsleleGeNs RHO(T)) 

PRINT FORMAT OUTIBs (TtloleleGeNs (K=leloKeGoMU(I) > 
O( JzleslesJeGoNo MCINDCINDICKI+4+1)45)))) 

PRINT FORMAT OUT1Cs (T=lolsTeGeNs (K=lslsKeGoMU(I)s 
Of JzloletJeGeNs RI INDCINDLEK)+IE)4+J5)))) 

SETVFERST TO LIST 

LIST = 0 

THROUGH DELTA» FOR K=SoleKeGel 

THROUGH GAMMAs FOR I=lslolIeGeN 

VICT)=VMAX el I9KoM) 

V2¢CI)=POL 


eVOITANSTIV ef JAMAW MASOORIA 

ett \V 270 23AUJAV 2ATUIMOD YISVIeCRUDIA MAROON cIHTA 
‘7 10 JAMIT SO WA DATWOIJ ICA eTesseaelel?T GHA Uenot f=TFP 
) ROT 23 QAAWAA JAWUIMAST SHT GAA A e@T XLIATAM QGHAWSSA 
Wl 2AVITAMMSTJA {1)UM BHT REVO 2f WOITASIMIXAM SHTA 


ae | SOI AT3@ KIATAM A seAOTDAR THUODEIO SHT ef ATAAA 
eO3SMUZeAA 






SOOT eT2t de [GUL cATIOe MetIMe QU] eMIGAeOHFe A YOMMOD MALDON -— @ 
JOGeMIGVIO 7 

» 10% [Ne G2XAMe He Lele Te2eaGWOTe lI eNeUMeGn! AAOATHE 
AT LODI IOUT. COL OHAs (MIGAeOORIMe (MEGHeOOR) A voTenvImtTa 
COLISVe COLIC VeL OOOLS I) Tetsef OLIV [ahle (Or)UMm0 
OeOeLcE=MTGR SSUJjJAV AOTIAV 
OeleS=MIQUI 23UJAV AOTIAV 
ATWHeGeKAMeTaeZeH eLTUAUI TAMSOS GASA 

AT@aG eAITVUO TAMAOA THIAG 

v=(SMIGe 

ve (eb Mtos 

Ve {SMIGUY 

O=( LI ITGVI 

O=¢ le IQ 

WeSerelafey AOA eLAAJA HOUOAHT 

VUsrH= (Lede [GHWT 

vere fV 

GEXAMeDoAciel=A FOF ATI HOUOAHT 

UsyH= (+N COME 

[Wee ( {e l4+A CUT 

Wedoleflel=!I AOA eAAJA HOVORHT 

Awl+(Aerea( f4+Te I+ OKT 
(VIUM eee (£)UM eSTUGKL TAMAO4 GASH 
(CIVUM «Vedelere fel) e3LTUO TAMSON THIRS 
CLe le tM efiete (VIUMIT woe (IefeI)A eETUGUT TAMAOF GAS 
(MIOHS eos (LOHR el Mee (MIUMIM sea 

({TIOHA eWedelefeLl=1) eGLTUO TAMNOG TUTAG 
e(TIUMSDeHe Lel=N) eWedeleIe lel) eB8ITUO TAMHOF THIAG 
CCCLLECT FUND IOWMTIGNEIM eNedele le let 7a 
e(T2UMe de Ne Le L=¥) eWedeleleletl eDLTUG TAMROR THIRG 
COCELe( THOXDIOULIGATIS ehedele le lelI0 

TeijJ oT T2tj 32 

0 © TetJ 

TeDeXele2eX AOA eATIIA HOVORHT 

~ Aedelel Iafl@=!I GORA «AMMAD HOUVORHT 

(MeMeT) eMAMVe(IILV 
jOUe(TISV AMM 


PHI 


PRINT FORMAT OUT2»® Ke 
DELTA PRINT FORMAT OUT3s V2(1) 


V1(1) eos VIN) 
eee V2(N) 


TRANSFER TO READ 


RFORMAT 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 


SPECIFICATIONS 


VALUES 
VALUES 
VALUES 
VALUES 
VALUES 
VALUES 
VALUES 
VALUES 
VALUES 


INPUT1=S$41109F1005*S 

INPUT 2=S1OI7T#$ 

INPUT3=$(7F1005)#§$ 

OUTIA=$7HIBETA =2G61505%$ 

OUTIB=S$6HO M =28G6150e5/(8G615e5)#$ 

OUTI1C#$6HO R =98G61555/(8G1505)*$ 
OUTID=S6HORHO =9861525/861505*$ 

OUTIESS5HOMU =21015*$§ 

OUT2=SBHOFOR T =9I2s14H Vile Ts M) 2=27G15.5/ 


08615e5*$ 
VECTOR VALUES 
END OF PROGRAM 


OUT3=S7H POLICYs1lOI5#$ 


EXTERNAL FUNCTION (Ils Nis M) 
ENTRY TO VMAXe 


RTHIS FUNCTION RECURSIVELY COMPUTES MAX VUIL»sNleM)#V¥9 THE 
RMAXIMUM EXPECTED RETURN IN Ni STEPS IF THE SYSTEM STARTS IN 
RSTATE 11 WITH PARAMETER MATRIX Me PRIOR DISTRIBUTION IS 
RMATRIX BETAe MAXIMIZATION IS OVER THE MU(T1) 

RALTERNATIVES IN STATE Ile 


PROGRAM COMMON RsRHOsRDIMs IND sMUsNsBETAs INDI sLISTs TOPs 
OINDIMsPOL 

INTEGER TloNleTeN2eKoINDsMUsNsRDIMsJeIND1» TOP sPOL 
DIMENSION R¢(500eRDIM) sRHO(10} sRDIM(3) sIND( 1LOOsINDIM) » 
OLIST(21000 } oMU(10)sIND1(10) sINDIM( 2) »TM(500sRDIM) 

Y=Tl 

N2=N1 

Y=1.e6=35 

WHENEVER N2eEeOs FUNCTION RETURN RHO(TI) 

THROUGH ALFA® FOR K=lsloKeGeMU(T) 

MSUM20¢ 

THROUGH PHI» FOR J=lelsJeGeN 
MSUM=MSUM4M CIND( IND1(K)4+1)4J) 

STOR=0. 

THROUGH GAMMAs FOR J=1l912JeGeN 

SAVE RETURN 

SAVE DATA N22POL sMSUMeSTORSYsM(KeToJ) eoo MIMULT) oI oN) of ook 
EXECUTE TR1elIsJeKeoMoTM) 
X=VMAXel JeoN2—-19TM) 
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(L+(1T+¢(%) LOWT VOUT) MeMUeMenUcM 

20=90T2 

NedeLeler=t AOF «AMMAD HOVOAHT 
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(MT e f=<SWel) oXAMVEX 


GAMMA 


ALFA 
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RESTORE DATA KeJUsIeM(MULT}sI9N) eco M(KeTeJ) 2 YsSTORSMSUMs 
OPOL »N2 

RESTORE RETURN 
STOR=SSTOR+(MCINDI INDI (K)4T24+5) /MSUM) R(ROCINDOINDI(K 42045) 
0+BETA#X ) 

WHENEVER STOR eLEe Ys TRANSFER TO ALFA 

Y=STOR 

POL&K 

CONTINUE 

FUNCTION RETURN Y 

END OF FUNCTION 


EXTERNAL FUNCTION (I1leJ192K1sMeTM) 


ALFA 


ENTRY TO TRle 


RTHIS FUNCTION EFFECTS THE TRANSFORMATION FROM THE PRIOR 
RPARAMETER MATRIX M TO THE POSTERIOR PARAMETER MATRIX 
RTRLel(IlsJisK1l9M)=TMs WHEN A TRANSITION IS OBSERVED FROM Ii TO 
RJ1 UNDER ALTERNATIVE Kie PRIOR DISTRIBUTION IS MATRIX BETA. 


PROGRAM COMMON ReRHOeRDIMoINDoMUsNoBETAsINDIsLISTsTOPo 
OINDIMsPOL 

INTEGER IloJleoKisI sJoKsTNDsMUsNo INDI 

DIMENSION R500 sRDIM) 9RHO(10}) sRDIM(3) sIND(LOOSINDIM) & 
OL TST(21000) »MU(10)2TND1(10) sINDIM( 2) 

THROUGH ALFAs FOR I=lelsoleGeN 

THROUGH ALFA»®s FOR J=eleloJeGeN 

THROUGH ALFA® FOR K=1lsleKeGeMUl(I) 
TMCINDCINDT (K)4T +S eMC INDC INDI (K}4T 045) 

TMCINDCUINDY (K1)24#7929451) =MCINDCIND1 (K1)411)431)41.0 
FUNCTION RETURN 

END OF FUNCTION 


eC Rh PUMIM 


WRUT SAS gOor22 
YITGUTIOMT IA) SIMUSMACLHI TEC XD IGUEIGUATIM)<+90T2290T2 AMMA 
{A*A I 5G- ' 

ALJA OT ABSASMUAAT eV eoFJo HOTS HSVIVAH 

HOTS=¥ 

Aa JO" 
SUWTTAOD AAJA 

Y WUAUTAA AOTTONUS 
WOLTARAUT FO. QHZ - 


(MT eMel¥eIGell) VOTTOVUA JANAITXS 
ef8T OT YATHS 


@OTAS SHT MORI WAOITAMAOA™NUART BHT 2TIZAWAWS VOITIAUA STHTA 
RIATAM ASTAIMAGAGD AOTASTSOS SHT OT M X1TATAM SSATAMARAIA 

; MOA GSIVAS2EO 21 WOITIZWART A WEHW eMTa=tMelAeltiellbelATA 
A198. XPATAM 2% WMOTTUBIATSIO AOING of BVITAMAST IA ASGWU [UH 


O07 «TEI Je fGVTeATIAe He UMeCHleMIO9sOHAeH MOMMOD MAROOAS 
JO4eMTQHIO 

[AVL eWeUMe GM leNeLelel Mellel] AZOSTUI 

» (UTGUTEOOLIGUIe (EIME GH: COL OHA: (MTR 00EIAR YOTEHAMIG 
CSIMIGVE « (OL) LANTe (OL IUMe ( COOLS) T2r 40 

WeDetefel=!l FOF «AAJA HOVORHT 

WedaletIal sl ROA eAAJA HOUOAHT 

(TIUMsOeHeLeleX% AOD eATIA HOUVOAHT 
(LEC T+ CUELOMT) OWI MetlL4CT4( A) FOMTIOMTIMT A 
Oe re (LLeCLT4t INP LOWE OMT) Me (fl ebLTeC Ld) LOWT GUT MT 
: WAUTAA WOTTINUA 
MOTTDMUR FO ON 





APPENDIX C 


PROGRAM PHI MATRIX TO 
COMPUTE EQUATION (4.1.2) 


RPROGRAM NAME IS PHI MATRIX 

RTHIS PROGRAM RECURSIVELY COMPUTES VALUES OF PHI(IsJsTloM) 
RFOR TeJe1secee2N AND T1=Ssec0este A MATRIX BETA 

RPRIOR 1S ASSUMED. 


PROGRAM COMMON INDeNesJeLISTsoTOPsMDIM 

INTEGER No INDsIsJeSeTsKsTOP 

DIMENSION M{(1002MDIM) sIND(10) oLIST( 21000) eF(10) 
VECTOR VALUES MDIM=291°20 


READ READ FORMAT INPUT1» NoSeT 


ALFA 


MDIM(2)=N 

IND(1)=0 

THROUGH ALFA» FOR K=lselsKeEoN 

IND(K+1)=K#N 

READ FORMAT INPUT2> M191) eee MINN) 

PRINT FORMAT OUT1»s NoSoTs (K=leolsKeGeNo(L=1lelsteGeN» 
CMCIND(K)4L))) 

SET LIST TO LIST 


LIST=0 


DELTA 


EPS 


GAMMA 


THROUGH GAMMAs FOR K2Se1sKeGeT 
THROUGH GAMMAs FOR I=1lslsteGeN 
THROUGH DELTAs FOR J=lolsJeEoN 
FC JS)SPHT ol TeK aM) 

F(N)=1.0 

THROUGH EPS» FOR J=1lolsJeEoN 
FINJ=FIN)J@F (J) 

WHENEVER TeFel 

PRINT FORMAT OUT2» Ks F(l) eee FIN) 
OTHERWISE 

PRINT FORMAT OUT3s F(1) eee FIN) 
END OF CONDITIONAL 

TRANSFER TO READ 


RFORMAT SPECIFICATIONS 

VECTOR VALUES INPUT1=$3110#$ 

VECTOR VALUES INPUT2=$(7F1025)#$ 

VECTOR VALUES OUT1=S3HIN=91594H S=915e4H T=915/ 

OC 1H 9 861525)*S 

VECTOR VALUES OUT2=S$7HOFOR Ts» 129 15H PHI(I9JeTsM)=» 
06G61505*$ 

VECTOR VALUES OUT3=$S24e 6615.5*$ 

END OF PROGRAM 
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ALFA 


BETA 


2 Bien 


EXTERNAL FUNCTION (Ils Tls M) 
ENTRY TO PHIe 


RTHIS FUNCTION RECURSIVELY COMPUTES PHI(IT1lsJ»TloMbeYs THE 
RPROBABILITY THAT AT TIME T1 THE SYSTEM WILL BE IN STATE Js 
RGIVEN THAT AT TIME O IT WAS IN STATE 11 WITH PARAMETER 
RMATRIX Me PRIOR IS MATRIX BETAe 


PROGRAM COMMON INDoNsoJoLISTsTOPesMDIM 

INTEGER IleJeoTLlsIeTsKeN»INDsTOPsMDIM 

DIMENSION IND(10)e2LIST(21000) sMDIM(2)9TM(1009MDIM) 
T=1I1 

T=Ti 

MSUM=0. 

THROUGH ALFAs FOR K=lslsKeGeN 
MSUM=MSUM+4M(CIND{(T)+K) 
WHENEVER TeEeols FUNCTION RETURN MC IND(1I)4J5)/MSUM 
Y=06. 

THROUGH BETAs FOR K=1519KeGeN 

SAVE RETURN 

SAVE DATA YoT>»>MSUMe9Ml1i91) eee MINN) 910k 

EXECUTE TRelIsKaMsTM) 

X=PHIo(KsT=19TM) 

RESTORE DATA KeTIsM(NseN) coo Mlisl) sMSUMsTsY 
RESTORE RETURN 

Y=Y+(MCINDCIT)+K ) /MSUM)#X 

FUNCTION RETURN Y 


END OF FUNCTION 


ALFA 


EXTERNAL FUNCTION (I 9KeMsTM) 
ENTRY TO TRe 


RTHIS FUNCTION EFFECTS THE TRANSFORMATION FROM THE PRIOR 
RPARAMETER MATRIX M TO THE POSTERIOR PARAMETER MATRIX 
RTelIsKsM)=TMs WHEN ONE TRANSITION FROM I TO K IS OBSERVED. 
RPRIOR IS MATRIX BETAe 


PROGRAM COMMON INDeNoJoLISTeTOPsMDIM 
INTEGER IsKsINDoJelsNoJl oMDIMsTOP 
DIMENSTON IND(10) sMDIM(2) sLIST(21000) 
THROUGH ALFA» FOR J1=1l9l»J1leGeN 
THROUGH ALFA® FOR L=lolsLeGeN 
TMCIND(J1)4LI=MCIND(JL)4L) 
TMCIND(I)+K)STIMCIND( I) +K 94120 
FUNCTION RETURN 

END OF FUNCTION 
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APPENDIX D 


PROGRAM PIAPROX 10 COMPUTE 
EQUATIONS (4.2.42). 


RPROGRAM NAME IS PIAPROXe THIS PROGRAM RECURSIVELY COMPUTES 
RVALUES OF THE SUCCESSIVE APPROXIMANT PI(IeT1eM) FOR 
RI=leseee2sN AND T1=SeeeesTe A MATRIX BETA PRIOR IS USED. 


PROGRAM COMMON INDoeNeLIST»sMDIMsN1lsADIMsAIND 
INTEGER No INDeToKeSeTeNi sAIND 
DIMENSION M(100eMDIM) sIND(10) sLIST{(21000) sF(10) sAIND‘{10) 
VECTOR VALUES MDIM=29120 

VECTOR VALUES ADIM=29190 

READ FORMAT INPUT1»® NoSoT 

MDIM(2)2N 

N1I=N4+1 

ADIM(2)=N1 

AIND(1)=0 

IND(1)=0 

THROUGH ALFA» FOR K=lolsKeEeN 
AIND(K41)=K#N1] 

IND(K4+1)SK#N 

READ FORMAT INPUT2s M(le1) ecoe MINN) 
PRINT FORMAT OUT1»s NoSoTel(K=lelsKeGeNs (I=lolslIeGoN» 
OMCIND(K}41I))) 

SET LIST TO LIST 

LIST=0 

THROUGH GAMMAs FOR K=SelsKeGef 
THROUGH DELTAs FOR I=lolsTIeGeN 
F(TI=PTelIeKoM) ; 

PRINT FORMAT OUT2»2 Ks F(1) eeo FIN) 
SUM=0. 

THROUGH BETA» FOR I=lolsteGeN 
SUM=SUM+F(T) 

THROUGH EPS» FOR JT=leloleGeN 
F(I)=F(1)/SUM 

PRINT FORMAT OUT32 F(1) eso FIN) 
PRINT FORMAT OUT4»s SUM 

TRANSFER TO READ 


RFORMAT SPECIFICATIONS.) 

VECTOR VALUES INPUT1=$3110#$ 

VECTOR VALUES INPUT2=$(7F10.5)#5 

VECTOR VALUES OUTI=S3HIN@sI15024H S=sI504H T=sI5/ 
OC1H 9861505)*#3S 

VECTOR VALUES OUT2=S7HOFOR T=s12910H PI(T9M)=s(6G6150e5)#S 
VECTOR VALUES OUT32=$19H NORMALIZED VECTOR=9 (6615.5) *$ 
VECTOR VALUES OUT4=S$1H 9S1197HC(T9M)=9G61506*S 

END OF PROGRAM 








YJRVIZAUDSS MAADONA 2IHT  «XOAGAIG 2E SMAM MAR|ORIA 
INA ¢MelfTel)19 THAMIXOAQIA BVIZ2ADDU2 SHT FO Z23U4AVA 
J2U 22 AOIAT ATBG XIATAM A eTesvece2=fT GUA Veoceel=Ih 


OQUIAcMIGAc( he MI GMe Tel jehe Qui KVOMMOD MAROORSG 
GWiAe [he TeladXe le OWI en AIDSTHYI 

POLI QMTAc (OL Ae (OOOTSITSl je (OLIGHTe (MiGMe00L)IM WOTENSMIG 
OeLeS=mMIGmM CIUIAV ROTIIV " 
OeleS=MIGA @3UJAV AOTIAV ‘ 

Te2e «fTUGAL TAMAOA CASA CASA 

ve (SIMIOM 

[+vsly 

IM=(SIMIGA 

Oe (LIQHIA 

O=( LIGHT 

Wedete fe l= SOA cAAjJA HOVORHT 

[WRABCL+AIGUIA © 

Were(f+HIGQU] AMM 

(WeAIM eos (Lel IM eSTUSMT TAMROA GAZA 

tWNeDeleLe lel) eWedDodeLelaxdeTe2en eLTUO TAMHOG THLAG 
(¢¢1+(4) GK MO 

Téfs OT Tets T32 

O=Ters 

 TeDdeHele2=) AOA eAMMAD HOVOSHT 

WeOeoltefef=y FOF eATIIO HOUORHT 

{Metel) oi %=(1)4 

(HIA eee (IIA «A eSTUO TAMAOR THIAG 

: ; »OeMUe 

WedeleLel=t AOR eATRS HOVORHT 

iT }a+MUz=Mue 

UedDelefel*=1 ROW «2959 HOVOAHT 

, MU2\(TIA=(1)4 293 

{AA eoo (L£)9%eETUO TAMROA THIRG © 

MU2 «ATUO TAMAOS THIRD 

GASA OT ASACUANT 


eGhOITADTIIDIIS TAMHOAR 

Z2eOPPESelTUIMT S3UIJAV SOTISV 

@%(2@eO0L8ST )2eSTUSAL SC3USAY FOTDSY 

N@le=T Hesle=e Hee@lesWlHEZe{[TUG SR3UJAV AOTOAV 
. f¥¢2@,2lOB—e HILO 
*(Re@lDaea(MeTIIG HOLeSte=T AOFOHT2=STUO S3SUjAV FOTIIV 
Pe CeA@LISIeBAOTIIV GSAYIAMAOM HEL@#ETUO e3UJAV SOTSIV 
etaetlDe=(MeTIDHVe lL f2e HIZeaTVUO SSUIJAV HOTISV 

MARA0OF = iO ay 


= 26 


EXTERNAL FUNCTION(J19T19M) 
ENTRY TO Pile 


RTHIS FUNCTION RECURSIVELY COMPUTES PI(J1lsT19M).s THE T1ITH 
RSUCCESSIVE APPROXIMANT TO THE J1TH ELEMENT OF THE MEAN 
RSTEADY=STATE PROBABILITY VECTOR WHEN THE PRIOR IS MATRIX 
RBETA WITH PARAMETER Me 


PROGRAM COMMON INDeNeLISTesMDIMsN1sADIMsAIND 
INTEGER JleTleTeJsKsNsTsINDeMDIMaN1l sADIMaAIND 
DIMENSION IND(10)eLIST( 21000) sMDIM(2)9TM(1LOOsMDIM) sPBARI10} © 
OADIM(2) e2eAIND(10) 
J=J1 

T=Tl 

THROUGH ALFA» FOR K=1l91lKeGeN 
MSUM=0¢ 

THROUGH BETAs FOR ItlelsIeGeN 

BETA MSUM=MSUM4+M(IND(K)4T) 
ALFA PBAR(K)=MCIND(K)4J) /MSUM 

Y=0.6 

THROUGH GAMMAs FOR K=1919KeGeN 
SAVE RETURN 
SAVE DATA YeToPBAR(K) eee PBARIN) sKeJe Milsl) sosoo MINoN) 
MCIND(KIFJD EMEC IND (KI 450 416 
WHENEVER TeGels TRANSFER TO ZETA 
X=PIZROe(KoM) 
TRANSFER TO ETA 

ZETA X=PIe(KoT=1 M) 
ETA RESTORE DATA MINN) seo Milsl)sJsKsPBARIN) eeo PBAR(K) oT oY 
RESTORE RETURN 
GAMMA Y=Y+X#PBAR(K) 

FUNCTION RETURN Y 
END OF FUNCTION 


EXTERNAL FUNCTION(L1 0M) 
ENTRY TO PIZROe 


RTHIS FUNCTION COMPUTES THE TERMINAL FUNCTION PI{Lis0e2M) 
RAS THE LITH ELEMENT OF THE STEADY-STATE PROBABILITY VECTOR 
RCORRESPONDING TO THE MEAN OF THE PRIOR DISTRIBUTION} 
RPRIOR IS MATRIX BETA WITH PARAMETER Me 


PROGRAM COMMON INDoNsLISTsMDIMsN1sADIMsAIND 
INTEGER LisLoNsTIsJoKoNIlsINDsMDIMsADIMsAIND 


DIMENSION IND(10) eLIST(21000) sMDIM( 2) eADIM(2)2A(1109ADIM) © 
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MOITUBIATSIG ACTAG SHT WO KHASM SHT OT OUTIGNOFeSAAOOA 

oM SATSMASAD HTIW ATI XTATAM 2t SOrAGsA 
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BETA 
GAMMA 


ALFA 


DELTA 


EPS 


IOTA 
ETA 


LAMBDA 
ZETA 


OAIND(10) 


L=t1 

THROUGH ALFAs FOR I=loelsleGeN 
MSUM=0.5 

THROUGH BETAs FOR K=1sloKeGoN 
MSUM=MSUM4MCIND(T)+K) 

THROUGH GAMMAs FOR K=lelsKeEeN 
ACAIND(K)4I})2=MCINDCT)+K) 7MSUM 
ACAINDIN)41)=1le 
AtAIND(TI4N1)202 
ACAINDINIJ4N] DBle 

THROUGH DELTAs FOR K=1lolsKoEoN 
ACAIND(K)#K DPACAIND(K)+K) 410 
SCRAP=ALAIND(K )4L) 
ACAIND(K)4LI=ACAIND(K)+N) 
ACAIND(K)4+N)=SCRAP 
DIAG=A(AIND(1)41) 

THROUGH EPS» FOR J=291l9JeGeN 
ACAIND(1)+4J3)=A(AIND(1)4J5) /DIAG 
THROUGH ZETAs FOR J=201sJeGeN 
THROUGH ETAs FOR I=JsleleGeN 
SUB=ACAIND( I) 4+ 9) 

THROUGH IOTAs FOR K=lolsKeEeJ 
SUB=SUB—ACATIND(TI4K )#ACATIND(K 949) 
ACAIND(1}4J5)=SUB 
DIAG=A(AIND(J)4+J) 

THROUGH ZETA» FOR I=J+1lolsleGoNl 
SUB=A(AIND(U)+T) 

THROUGH LAMBDA» FOR K=lolsKeEoJ 
SUB=SUB—ACAIND(JUI4K) * ACAIND(OK) 41) 
ACAIND(J)+I)=SUB/DIAG 

FUNCTION RETURN ACAIND(N)4N1) 
END OF FUNCTION 
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APPENDIX E 


PROGRAM VASYMP 10 COMPUTE 
EQUATION (4.3.13). 


RPROGRAM NAME IS VASYMPe 

RTHIS PROGRAM RECURSIVELY COMPUTES VALUES OF V(IoJeM) FOR 
RI=lsceceNs JzSsecenTs THE REWARD MATRIX IS R AND THE TERM- 
RINAL REWARD VECTOR IS RHOe A MATRIX BETA PRIOR IS ASSUMED. 
RTHE DISCOUNT FACTOR IS BETAe 


PROGRAM COMMON RsRHOSRDIMsINDsNoBETAsLISTs TOP 

INTEGER Neo INDoJoSeTsKsbL»TOP 

DIMENSION R(1LO0sRDIM) M1100 sRDIM) sRHOC10) SIND(10) 9V1(10) © 

- OV2(10)eLIST(210005 

a VECTOR VALUES RDIM=29120 
READ READ FORMAT INPUT1» NsSeTsBETA 

RDIM(2)=N 

IND(1)=0 

THROUGH ALFAs FOR K=leleKeEoN 
ALFA IND(K+1)2K#N 
READ READ FORMAT INPUT29® R(191) eee RINSN)®9 Ml1l91) eee MINN)» 

ORHO(1) eos RHO(N) 

PRINT FORMAT OUT1A® BETA © 

PRINT FORMAT OUT1B® (K=Lol»KeGeNs(LeleletLeGoNs MC INDIKS+L))3 

PRINT FORMAT OUT1ICs(K=1919KeGoNol(L=1] eleiLeGeNeR(INDI(KI4L))) 

PRINT FORMAT OUT1Ds (K=1lelsKeGeNs RHO(K)) 

THROUGH PHI» FOR K=lolsKeGeN 

PHI V2(K)=0. 

SET LIST TO LIST 

LIST=0 

THROUGH DELTAs FOR K=SeleKeGel 

THROUGH GAMMAs FOR L=lslsleGoN 

VI(LI=VelCL aK sM) 
GAMMA V2(L)=Vi(L)-V2(L) 

PRINT FORMAT OUT2s Ks V12(1) see V1IIN) 

PRINT FORMAT OUT3s V2(1) eee V2IN) 

THROUGH DELTA» FOR L=1lel»s LeGeN 
DELTA V2(L)=V1(L) 

TRANSFER TO READ 


RFORMAT SPECIFICATIONS 

VECTOR VALUES INPUT1=$3110» F10.5*§ 

VECTOR VALUES INPUT2=$17F1005)#$ 

VECTOR VALUES OUTIA2S$8H1BETA =2615.5*$ 

VECTOR VALUES OUT1B=$8HO M 298G615¢e5/(1H »861565)*% 
VECTOR VALUES OUT1C=$8HO R =98G15e5/ (1H 9861505) #% 
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GOT ae TELJeATIAeUNe GUI eMIGHeOHAe FH HOMMOD MAHOORG 


(HIIV ooo (LIEV eX eSTUQ TAMAOF THIAG 
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@#2,2fde= ATIALIHBS2ALTUO S3UJAV AOTISV 
Hid\Cs@idSe= MM OHS@s8LTUO 2@3UjAY AOTDSV 
SN(QetlOGe HIINE.elObe= F OHBZ2DITUO S3BUIAV AOTIAV 


eAIMYCAV 21 3MAU MAROOAIS 


eAT3I8 2&1 AOTIAR TAUOSeTOG 3HTA 


GOT e Ja NeTe2eLeGvieN ASDATUL 
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OreLeS=MIGA 2SIUJAV AOTIAV 
ATHGeTe2eh «(TUNE TAMAOA GABA GARE 
Vint SMIGS 

O=(L)QUt 

NeRetefe l=) FOR «AVIA HOUORHT 
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TeDeAele22X AOA sATIFQ HOVOARHT 
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Es) Va (SV 
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VECTOR VALUES OUT1D=S$8HO RHO =98615e5/8G6150e5%*$ 

VECTOR VALUES OQUT2=S8HOFOR T &s I22 14H V(Ie Ta Mp) =» 
076154e5/8615.5*$ 

VECTOR VALUES OUT3=$S3s21HDELTA V(Is T#el»s M) =27G615.5/ 
08615¢5%*$ 

END OF PROGRAM 


EXTERNAL FUNCTION (Ils Jls M) 
ENTRY TO Ve 


RTHIS FUNCTION RECURSIVELY COMPUTES VelIleJleM)eY¥s THE TOTAL 


REXPECTED DISCOUNTED RETURN IN J1 STEPS IF THE SYSTEM STARTS 


ALFA 


GAMMA 


RIN STATE 11 WITH PARAMETER MATRIX Me PRIOR IS MATRIX BETAc 


PROGRAM COMMON ReRHOSRDIMsINDsNoBETAsLISTsTOP 
INTEGER IlsJlsTIeJesKoINDeNeRDIMsTOP 
DIMENSION R(1L00sRDIM) sRHO(10) sRDIM(2) 9IND(10) sLIST(21000) 5 
OTM(100sRDIM) 
T=Il 
J=Ji 
WHENEVER J eEe Os FUNCTION RETURN RHOC(I) 
MSUM=0. 
THROUGH ALFAs FOR K=1lsleKeGeN 
MSUM=MSUM4M(IND(T 94K) 
Y=05. 
THROUGH GAMMAs FOR KelelsKeGoN 
SAVE RETURN 
SAVE DATA J9YsMSUMeM(IeK) eoo M(IoN) oI 0K 
EXECUTE TRelisK»sMsTM) 
X2Vel(KeaJ—19TM) 
RESTORE DATA KoTIoMlIoN) see MCT 9K) sMSUMsY oJ 
RESTORE RETURN 
Y=Y4+(MCINDC1)4K) /MSUM)RIRC INDO 14K )4+4BETA#X) 
FUNCTION RETURN Y 
END OF FUNCTION 
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SYeAT2 M3TeYe BHT FI Sq3aT!S IL WI WBAUTAA GITAVOSSTG GaTIZIXAA 
ATI XEATAM @1 SOIHA oM XIATAM AASTAMARAG HTIW II STATS UTA 


OTe TEL Je ATIAVaWe GNI eMIGAeCHAe A MOMMOD MASDOAG 
GOT eMIGheWeAnleMeLeleflel!l AIDSTAUT 

(GOOLTS ITS se (OL) GWT (SIMIGHe (OLIOHAe (MIGHe COLI AOTCWnSMIA 
; CMTIGHeOOLIMTO 

[tef 

Flat * 

{LOHR WVAUTSA WVOTTOMUR «0 e3e L ASIVIVSHW 
eO=@MUEM 

VedDereleL=¥. AOF eAQIA HOUORHT 

(M+¢ 7) GUI) MeMU2MemMUeM 

— @OsyY 

WeDeXeLel=y HOW «eAMMAD HOVOAHT 

: ARUT3SS 3VA!S 

Mele(WelIM eve (HelIMeMUemMeYal ATAG 3VAS 
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(MT elL=Led) eo VEX 

LeYeMU2Me (MeTIM ecoe (UeTIMeTeR ATAG 3ROTSSA 
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(X¥ATIG+ (N+ CTIGUILI RP eCMUSMNE MEET IOUTIMI4Y SY 
¥Y wSUTAIA WortTsnuUa 
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EXTERNAL FUNCTION (Io Ks Ms TM) 
ENTRY TO TRe 


RTHIS FUNCTION EFFECTS THE TRANSFORMATION FROM THE PRIOR 
RPARAMETER MATRIX M TO THE POSTERIOR PARAMETER MATRIX 
RTolIsKsM)=TMs WHEN ONE TRANSITION FROM I TO K IS OBSERVED. 
RPRIOR IS MATRIX BETAs 


PROGRAM COMMON RsRHOsSRDIM»INDsNeBETAeLISTs TOP 

DIMENSTON R(100eRDIM) sRHO(10) sRDIM(2) 9IND(10) sLIST( 21000) 
INTEGER IsKeINDsJolLoN 

THROUGH ALFAs FOR J=leleJoGeN 

THROUGH ALFAs FOR L=lsleteGeN 

TMCIND(J)4L) SMC IND(J)4L) 

TMCIND(T)4#K)=TMCINDCI)+K 9410 

FUNCTION RETURN 

END OF FUNCTION 
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oO} MOPFA UOLTI2WAAT BYUO WASHW e Me Mel} 
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YeodeleLefL2l AOW «AVIA HOVORHT 
WeDo tele l=zJ ROW eAGJA HOUVORHT 
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